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Abstract.  Recent  empirical  research  indicates  that  many  convex  optimization  problems  with  random  constraints 
exhibit  a  phase  transition  as  the  number  of  constraints  increases.  For  example,  this  phenomenon  emerges  in  the  ( \ 
minimization  method  for  identifying  a  sparse  vector  from  random  linear  samples.  Indeed,  this  approach  succeeds 
with  high  probability  when  the  number  of  samples  exceeds  a  threshold  that  depends  on  the  sparsity  level;  otherwise, 
it  fails  with  high  probability. 

This  paper  provides  the  first  rigorous  analysis  that  explains  why  phase  transitions  are  ubiquitous  in  random 
convex  optimization  problems.  It  also  describes  tools  for  making  reliable  predictions  about  the  quantitative  aspects 
of  the  transition,  including  the  location  and  the  width  of  the  transition  region.  These  techniques  apply  to  regularized 
linear  inverse  problems  with  random  measurements,  to  demixing  problems  under  a  random  incoherence  model, 
and  also  to  cone  programs  with  random  affine  constraints. 

These  applications  depend  on  foundational  research  in  conic  geometry.  This  paper  introduces  a  new  summary 
parameter,  called  the  statistical  dimension,  that  canonically  extends  the  dimension  of  a  linear  subspace  to  the  class 
of  convex  cones.  The  main  technical  result  demonstrates  that  the  sequence  of  conic  intrinsic  volumes  of  a  convex 
cone  concentrates  sharply  near  the  statistical  dimension.  This  fact  leads  to  an  approximate  version  of  the  conic 
kinematic  formula  that  gives  bounds  on  the  probability  that  a  randomly  oriented  cone  shares  a  ray  with  a  fixed  cone. 


1.  Motivation  and  contributions 

A  phase  transition  is  a  sharp  change  in  the  character  of  a  computational  problem  as  its  parameters  vary. 
Recent  work  indicates  that  phase  transitions  emerge  in  a  variety  of  random  convex  optimization  problems 
from  mathematical  signal  processing  and  computational  statistics;  for  example,  see  [DT09b,  Sto09,  RFP10, 
CSPW11,  MT12,  DGM13].  This  paper  develops  geometric  tools  that  allow  us  to  identify  the  location  of  these 
phase  transitions  using  geometric  invariants  associated  with  the  mathematical  program.  Our  analysis  provides 
the  first  fully  rigorous  account  of  transition  phenomena  in  random  linear  inverse  problems,  random  demixing 
problems,  and  random  cone  programs. 

1.1.  Vignette:  Compressed  sensing.  To  illustrate  our  goals,  we  discuss  the  compressed  sensing  problem,  a 
familiar  example  where  a  phase  transition  is  plainly  visible  in  numerical  experiments  [DT09b] .  Let  xq  e  WL/  be 
an  unknown  vector  with  5  nonzero  entries.  Suppose  we  have  access  to  a  vector  zq  e  Um  consisting  of  random 
linear  samples  of  jc0,  where  the  number  m  of  samples  is  smaller  than  the  ambient  dimension  d.  More  precisely, 
assume  that  zq  -  Ax 0  where  A  is  an  m  x  d  matrix  with  independent  standard  normal  entries.  The  aim  is  to 
reconstruct  xo  given  the  measurement  vector  zq,  the  measurement  matrix  A,  and  the  prior  knowledge  that  x() 
is  sparse.  One  may  regard  this  formulation  as  a  toy  model  for  understanding  when  it  is  possible  to  solve  an 
underdetermined  linear  inverse  problem  with  a  structured  unknown. 

A  well-established  approach  [CDS01,  CT06,  Don06a]  to  the  compressed  sensing  problem  is  the  method  of 
(1  minimization: 

minimize  ||jc||i  subject  to  z0  -  Ax.  (1.1) 

We  say  that  the  convex  program  (1.1)  succeeds  at  solving  the  compressed  sensing  problem  when  it  has  a 
unique  optimal  point  x  that  equals  the  true  unknown  xq]  otherwise,  it  fails. 

Figure  1.1  depicts  the  results  of  a  computer  experiment  designed  to  estimate  the  empirical  probability 
that  (1.1)  succeeds  (with  respect  to  the  randomness  in  A)  as  the  sparsity  level  ,v  and  the  number  m  of  samples 
range  from  zero  to  the  ambient  dimension  d.  The  plot  evinces  that,  for  a  given  sparsity  level  s,  the  (\ 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

MAR  2013 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

Living  on  the  Edge:  A  Geometric  Theory  of  Phase  Transitions  in  Convex 
Optimization 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

California  Institute  of  Technology, Department  of  Mechanical 
Engineering, Pasadena, CA, 91 125 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2013  to  00-00-2013 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROIECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

Recent  empirical  research  indicates  that  many  convex  optimization  problems  with  random  constraints 
exhibit  a  phase  transition  as  the  number  of  constraints  increases.  For  example,  this  phenomenon  emerges 
in  the  ‘1  minimization  method  for  identifying  a  sparse  vector  from  random  linear  samples.  Indeed,  this 
approach  succeeds  with  high  probability  when  the  number  of  samples  exceeds  a  threshold  that  depends  on 
the  sparsity  level;  otherwise  it  fails  with  high  probability.  This  paper  provides  the  first  rigorous  analysis 
that  explains  why  phase  transitions  are  ubiquitous  in  random  convex  optimization  problems.  It  also 
describes  tools  for  making  reliable  predictions  about  the  quantitative  aspects  of  the  transition,  including 
the  location  and  the  width  of  the  transition  region.  These  techniques  apply  to  regularized  linear  inverse 
problems  with  random  measurements,  to  demixing  problems  under  a  random  incoherence  model  and  also 
to  cone  programs  with  random  affine  constraints.  These  applications  depend  on  foundational  research  in 
conic  geometry.  This  paper  introduces  a  new  summary  parameter,  called  the  statistical  dimension,  that 
canonically  extends  the  dimension  of  a  linear  subspace  to  the  class  of  convex  cones.  The  main  technical 
result  demonstrates  that  the  sequence  of  conic  intrinsic  volumes  of  a  convex  cone  concentrates  sharply  near 
the  statistical  dimension.  This  fact  leads  to  an  approximate  version  of  the  conic  kinematic  formula  that 
gives  bounds  on  the  probability  that  a  randomly  oriented  cone  shares  a  ray  with  a  fixed  cone. 

15.  SUBIECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

46 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


2  D.  AMELUNXEN,  M.  LOTZ,  M.  B.  MCCOY,  AND  J.  A.  TROPP 


Compressed  sensing  with  £\  minimization 


Number  of  nonzeros  of  xo 

Figure  1.1:  Empirical  phase  transition  in  compressed  sensing.  The  colormap  indicates  the  empirical 
probability  that  the  minimization  problem  (1.1)  successfully  recovers  a  sparse  vector  jco  e  R100  from  the  vector 
zq  =  Axq  of  random  linear  measurements,  where  A  is  a  standard  normal  matrix.  The  probability  of  success 
increases  with  brightness  from  certain  failure  (black)  to  certain  success  (white) . 


minimization  technique  (1.1)  almost  always  succeeds  when  we  have  an  adequate  number  m  of  samples, 
while  it  almost  always  fails  when  we  have  fewer  samples.  See  Appendix  A  for  the  experimental  details. 

Figure  1.1  raises  several  interesting  questions  about  the  performance  of  the  £\  minimization  method  for 
solving  the  compressed  sensing  problem: 

•  What  is  the  probability  of  success?  For  a  given  pair  (s,  m)  of  parameters,  can  we  estimate  the 
probability  that  (1.1)  succeeds  or  fails? 

•  Does  a  phase  transition  exist?  Is  there  a  simple  curve  m  -  y/(s)  that  separates  the  parameter  space 
into  regions  where  (1.1)  succeeds  or  fails  most  of  the  time? 

•  Where  is  the  edge  of  the  phase  transition?  Can  we  find  a  formula  for  the  location  of  this  threshold 
between  success  and  failure? 

•  How  wide  is  the  transition  region?  For  a  given  sparsity  level  ,v  and  ambient  dimension  d,  how  big 
is  the  range  of  m  where  the  probability  of  success  and  failure  are  comparable? 

•  Why  does  the  transition  exist?  Is  there  a  geometric  explanation  for  the  phase  transition  in  com¬ 
pressed  sensing?  Can  we  export  this  reasoning  to  understand  other  problems? 

In  Section  10,  we  summarize  a  large  corpus  of  research  that  has  attempted  to  address  these  questions. 
Unfortunately,  the  current  results  are  fragmentary,  even  for  the  vanilla  compressed  sensing  problem.  This 
work  provides  a  detailed  answer  to  each  of  the  questions  we  have  posed. 

1.2.  Contributions.  We  approach  phase  transition  phenomena  by  studying  the  intrinsic  geometric  properties 
of  convex  cones.  This  attack  is  appropriate  for  the  compressed  sensing  problem  because  we  can  express  the 
optimality  condition  for  (1.1)  in  terms  of  a  descent  cone  of  the  £\  norm.  Our  techniques  apply  more  broadly 
because  convex  cones  play  a  central  role  in  convex  optimization.  Let  us  summarize  the  main  contributions  of 
this  work. 

•  We  introduce  a  new  summary  parameter  for  convex  cones,  which  we  call  the  statistical  dimension. 
This  quantity  canonically  extends  the  linear  dimension  of  a  subspace  to  the  class  of  convex  cones. 
(See  Definition  2.1,  Proposition  4.1,  Proposition  5.11,  and  Proposition  5.12.) 
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•  We  prove  that  a  regularized  linear  inverse  problem  with  random  measurements,  such  as  the  com¬ 
pressed  sensing  problem,  must  exhibit  a  phase  transition  as  the  number  of  random  measurements 
increases.  The  location  and  width  of  the  transition  are  controlled  by  the  statistical  dimension  of  a  cone 
associated  with  the  regularizer  and  the  unknown.  (Theorem  II,  Theorem  7.1,  and  Proposition  9.1.) 

•  The  literature  [MT12]  describes  convex  programming  methods  for  decomposing  a  superposition  of 
two  structured,  randomly  oriented  vectors  into  its  constituents.  These  methods  exhibit  a  phase  transi¬ 
tion  whose  properties  depend  on  the  total  statistical  dimension  of  two  convex  cones.  (Theorem  III 
and  Theorem  7.1.) 

•  A  cone  program  with  random  affine  constraints  displays  a  phase  transition  as  the  number  of  constraints 
increases.  We  can  predict  the  transition  using  the  statistical  dimension  of  the  cone.  (Theorem  8.1.) 

•  In  Section  4,  we  explain  how  to  compute  the  statistical  dimension  for  several  important  families  of 
cones  that  arise  in  convex  optimization  and  mathematical  signal  processing.  These  calculations  are 
not  substantially  novel,  but  we  provide  the  first  rigorous  proof  that  they  are  sharp  in  high  dimensions. 
(Propositions  4.2,  4.3,  4.7,  4.8,  and  4.9  and  Theorem  4.5.) 

These  applied  results  are  supported  by  foundational  research  in  conic  geometry.  Our  theoretical  analysis 
identifies  a  new  geometric  phenomenon  that  governs  all  the  phase  transitions  mentioned  above. 

•  We  prove  that  the  sequence  of  conic  intrinsic  volumes  of  a  general  convex  cone  concentrates  at  the 
statistical  dimension  of  the  cone.  This  result  provides  a  new  family  of  nontrivial  inequalities  among 
the  conic  intrinsic  volumes.  (Theorem  6.1  and  Section  6.1.) 

•  The  concentration  result  implies  an  approximate  version  of  the  kinematic  formula  for  cones.  This 
result  uses  the  statistical  dimension  to  bound  the  probability  that  a  randomly  oriented  cone  shares  a 
ray  with  a  fixed  cone.  (Theorem  I  and  Theorem  7.1.) 

As  an  added  bonus,  we  provide  the  final  ingredient  needed  to  resolve  a  series  of  conjectures  [DMM09a, 
DJM11,  DGM13]  about  the  coincidence  between  the  minimax  risk  of  denoising  and  the  location  of  phase 
transitions  in  linear  inverse  problems.  Indeed,  Oymak  &  Hassibi  [OH12]  have  recently  shown  that  the 
minimax  risk  is  equivalent  with  the  statistical  dimension  of  an  appropriate  cone,  and  our  results  prove  that 
the  phase  transition  occurs  at  precisely  this  spot.  See  Section  10.4  for  further  details. 

1.3.  Roadmap.  Section  2  explains  the  connection  between  conic  geometry  and  phase  transitions  in  two 
applications.  We  analyze  both  problems,  and  we  showcase  computer  experiments  that  confirm  the  accuracy 
of  our  predictions.  Section  3  introduces  notation  and  background  material.  In  Section  4,  we  explain  how  to 
compute  the  statistical  dimension  for  several  important  families  of  cones,  and  we  prove  that  the  calculations 
are  sharp.  Section  5  summarizes  the  central  concepts  from  conic  geometric  probability,  including  the  definition 
of  conic  intrinsic  volumes.  Sections  6  and  7  form  the  technical  kernel  of  the  paper.  Here,  we  establish  that  the 
sequence  of  conic  intrinsic  volumes  concentrates  at  the  statistical  dimension,  and  we  derive  an  approximate 
kinematic  formula  as  a  consequence.  Afterward,  we  return  to  applications.  Section  8  shows  that  cone 
programs  with  random  affine  constraints  exhibit  a  phase  transition.  Section  9  uses  our  theory  to  prove  that  a 
certain  linear  inverse  problem  is  bound  to  fail.  Finally,  we  canvass  the  related  work  in  Section  10. 

The  paper  contains  five  substantial  appendices,  which  house  many  of  the  technical  details.  Appendix  A 
provides  information  about  the  computer  experiments.  Appendix  B  justifies  the  methods  for  computing  the 
statistical  dimension  of  a  descent  cone.  Appendix  C  completes  several  statistical  dimension  calculations. 
Appendix  D  contains  most  of  the  analysis  required  to  show  that  the  sequence  of  conic  intrinsic  volumes 
concentrates.  Finally,  Appendix  E  establishes  a  connection  between  the  statistical  dimension  and  another 
geometric  quantity  called  the  Gaussian  width. 

2.  Conic  geometry  and  phase  transitions 

The  existence  of  phase  transitions  depends  on  a  striking  new  fact  about  the  geometry  of  cones.  Indeed,  our 
entire  approach  can  be  summarized  in  the  following  precept: 

Each  convex  cone  C  admits  a  “dimension” parameter  5(C).  For  certain  purposes,  the  cone  behaves 
like  a  linear  subspace  with  dimension  [5(C)]. 

This  section  introduces  the  quantity  5(C),  which  we  call  the  statistical  dimension  of  the  cone.  Then  we 
state  an  approximate  kinematic  formula,  which  uses  the  statistical  dimension  to  express  the  probability  that 
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two  randomly  oriented  cones  intersect  nontrivially.  This  theorem  parallels  familiar  statements  about  the 
intersection  of  two  randomly  oriented  subspaces.  Afterward,  we  apply  the  approximate  kinematic  formula 
to  explain  why  phase  transitions  arise  in  random  instances  of  regularized  linear  inverse  problems  (such  as 
compressed  sensing)  and  in  a  related  class  of  demixing  problems. 

2.1.  The  statistical  dimension  of  a  convex  cone.  For  certain  purposes,  the  dimension  of  a  linear  subspace 
carries  all  pertinent  information  about  the  subspace.  In  particular,  we  can  resolve  questions  in  stochastic 
geometry,  such  as  the  probability  that  two  random  subspaces  have  a  nontrivial  intersection,  as  soon  as  we 
know  the  dimensions  of  the  subspaces. 

Consider  the  more  general  problem  of  determining  the  probability  that  a  randomly  oriented  subspace 
shares  a  ray  with  a  fixed  convex  cone.  Prima  facie,  it  seems  impossible  that  we  could  answer  this  question 
without  detailed  information  about  the  cone.  Nonetheless,  there  is  a  single  parameter  that  encapsulates  all 
the  relevant  geometry  of  the  cone. 

Definition  2.1  (Statistical  dimension).  The  statistical  dimension  8(C)  of  a  closed  convex  cone  C  c  IRd  is  defined 
as 

d(C):=E[||nc(g)l|2],  (2.1) 

where  g  e  is  a  standard  normal  vector,  ||-||  is  the  Euclidean  norm,  and  lie  denotes  the  Euclidean  projection 

onto  the  cone  C: 

ncW  :=  argmin{||x- y||  :yeC}. 

We  define  the  statistical  dimension  of  a  general  convex  cone  to  be  the  statistical  dimension  of  its  closure. 

As  we  show  in  Section  5.3,  the  statistical  dimension  canonically  extends  the  linear  dimension  of  a  subspace 
to  the  class  of  convex  cones.  Let  us  highlight  a  few  properties  that  support  this  claim.  First,  the  statistical 
dimension  of  a  subspace  L  c  Krf  satisfies 

5(L)=E[||nL(g)||2]=dim(L).  (2.2) 

Second,  the  statistical  dimension  of  a  cone  C  c  IRd  is  rotationally  invariant: 

8(UC)  -  8(C)  for  each  orthogonal  matrix  U  e  Rdxd.  (2.3) 

Third,  the  statistical  dimension  increases  with  the  size  of  the  cone  in  the  sense  that  C  c  K  implies  that 
5(C)  <  8(K).  See  Proposition  4.1  for  details  about  these  and  other  properties  of  the  statistical  dimension. 

Section  4  explains  how  to  calculate  the  statistical  dimension  for  several  important  families  of  convex  cones. 
The  examples  include  self-dual  cones,  circular  cones,  descent  cones  of  the  £\  norm,  and  descent  cones  of  the 
Schatten  1-norm.1  In  all  of  these  cases,  we  prove  rigorously  that  our  estimates  for  the  statistical  dimension  are 
sharp  in  high  dimensions. 

Remark  2.2  (Related  concepts).  The  statistical  dimension  of  a  cone  is  closely  related  to  its  Gaussian  width, 
another  summary  parameter  for  cones  that  has  been  proposed  in  the  literature  [RV08,  Sto09,  CRPW12]. 
See  Section  10.3  for  further  discussion  of  Gaussian  widths.  Recent  work  of  Oymak  &  Hassibi  [OH12]  also 
illuminates  a  connection  between  the  statistical  dimension  and  the  minimax  risk  for  denoising  problems;  see 
Section  10.4. 

2.2.  Kinematics  and  statistical  dimension.  Conic  integral  geometry  [SW08,  Ch.  6]  studies  properties  of 
convex  cones  that  are  invariant  under  rotation  and  reflection.  A  crowning  achievement  in  this  field  is  the 
conic  kinematic  formula  [SW08,  Thm.  6.5.6],  which  yields  the  exact  probability  that  two  randomly  oriented 
convex  cones  intersect  nontrivially.  As  recognized  in  [Amell,  MT12],  this  result  is  tailor-made  for  studying 
random  instances  of  convex  optimization  problems.  Our  main  applied  result  is  an  approximate  kinematic 
formula,  expressed  in  terms  of  the  statistical  dimension. 

Theorem  I  (Approximate  kinematic  formula).  Fix  a  tolerance  tj  e  (0,1).  Suppose  that  C,  K  c  Kd  are  closed 
convex  cones,  one  of  which  is  not  a  subspace.  Draw  an  orthogonal  matrix  Q  e  Udxd  uniformly  at  random.  Then 

<5(C)  +  <5(K)  <d-a,,\/d  =>  IP{C n  QK  =  {0}}  >  1  - 77; 

8(C)  +  8(K)>d  +  ar]\fd  =>  PjCn  QK-  {0}}  <  77. 

The  quantity  a v  l^/IogfE/ rf).  For  example,  ao.01  <  10  and  ao.001  <  12. 

1The  Schatten  1-norm  equals  the  sum  of  the  singular  values  of  a  matrix. 
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Theorem  I  allows  us  to  control  the  probability  that  two  randomly  oriented  cones  strike  as  soon  as  we  know 
their  statistical  dimensions.  Roughly,  the  cones  are  likely  to  share  a  ray  if  and  only  the  total  dimension  of  the 
two  cones  exceeds  the  ambient  dimension.  This  claim  is  in  perfect  sympathy  with  the  analogous  statement 
about  subspaces.  In  the  next  few  sections,  we  explore  applications  of  this  pluripotent  result. 

Section  7  contains  the  proof  of  Theorem  I,  which  has  two  main  ingredients.  The  first  piece  is  the  conic 
kinematic  formula,  which  expresses  the  exact  intersection  probability  in  terms  of  geometric  invariants  called 
conic  intrinsic  volumes  [SW08,  Sec.  6.5].  Each  cone  in  Rd  has  d+  1  conic  intrinsic  volumes,  much  as  a  convex 
set  in  Rd  has  a  volume,  a  surface  area,  a  mean  width,  and  so  forth  [KR97].  The  second  component  of  the  proof 
is  a  new  property  of  conic  intrinsic  volumes  (Theorem  6.1).  This  result  shows  that  most  intrinsic  volumes 
of  a  cone  C  are  negligible  in  size,  except  for  the  ones  whose  index  is  close  to  the  statistical  dimension  <5(C). 
Combining  these  facts,  we  can  bound  the  intersection  probability  accurately  via  the  statistical  dimension. 

2.3.  Regularized  linear  inverse  problems.  Our  first  application  of  Theorem  I  concerns  a  generalization  of 
the  compressed  sensing  problem.  A  linear  inverse  problem  asks  us  to  infer  an  unknown  vector  xo  e  Rd  from  an 
observed  vector  zq  e  R'"  of  the  form 

z0  =  Ax o,  (2.4) 

where  A  e  Rmxd  is  a  matrix  that  describes  a  linear  data  acquisition  process.  When  the  matrix  is  fat  ( m  <  d), 
the  inverse  problem  is  underdetermined.  As  a  consequence,  we  cannot  hope  to  identify  xo  unless  we  exploit 
some  prior  information  about  its  structure. 

To  solve  the  underdetermined  linear  inverse  problem,  we  consider  an  approach  based  on  optimization. 
Suppose  that  / :  Rd  —  R  is  a  proper  convex  function2  that  reflects  the  amount  of  “structure”  in  a  vector.  We 
can  attempt  to  identify  the  structured  unknown  x0  in  (2.4)  by  solving  a  convex  optimization  problem: 

minimize  fix)  subject  to  zq  -  Ax.  (2.5) 

The  function  /  is  called  a  regularizer,  and  the  formulation  (2.5)  is  called  a  regularized  linear  inverse  problem. 
To  illustrate  the  kinds  of  regularizers  that  arise  in  practice,  we  highlight  two  familiar  examples. 

Example  2.3  (Sparse  vectors).  When  the  vector  xq  is  known  to  be  sparse,  we  can  minimize  the  fj  norm  to 
look  for  sparse  solutions  of  the  inverse  problem.  Repeating  (1.1),  we  have  the  optimization 

minimize  ||x||i  subject  to  z0-Ax.  (2.6) 

This  approach  was  proposed  by  Chen  et  al.  [CDS01],  motivated  by  work  in  geophysics  [CM73,  SS86]. 

Example  2.4  (Low-rank  matrices).  Suppose  that  Xq  is  a  low-rank  matrix,  and  we  have  acquired  a  vector  of 
measurements  of  the  form  zo-szf  (Xo)  where  £/  is  a  linear  operator.  This  process  is  equivalent  with  (2.4).  We 
can  look  for  low-rank  solutions  to  the  linear  inverse  problem  by  minimizing  the  Schatten  1-norm: 

minimize  ||X||Sl  subject  to  zq-£^(X).  (2.7) 

This  idea  was  proposed  by  Recht  et  al.  [RFP10],  motivated  by  work  in  control  theory  [MP97,  Faz02]. 

The  paper  [CRPW12]  presents  a  general  framework  for  constructing  a  regularizer  /  that  promotes  a  specified 
type  of  structure,  as  well  as  many  additional  examples. 

We  say  that  the  regularized  inverse  problem  (2.5)  succeeds  at  solving  (2.4)  when  the  convex  program  has  a 
unique  minimizer  x  that  coincides  with  the  true  unknown;  that  is,  x-  xq.  To  develop  conditions  for  success, 
we  introduce  a  convex  cone  associated  with  the  regularizer  /  and  the  unknown  xo. 

Definition  2.5  (Descent  cone).  The  descent  cone  S){f,x o)  of  a  function  / :  Rd  — ►  R  at  a  point  xo  £  Rd  is  the 
conic  hull  of  the  perturbations  that  do  not  increase  /  near  xq. 

3>{f,x):=  U  {y  eUd  :  f[x+  ry)  <  f{x)}. 

T>0 

Chandrasekaran  et  al.  [CRPW12,  Prop.  2.1]  characterize  when  the  optimization  problem  (2.5)  succeeds  in 
terms  of  a  descent  cone.  This  result  is  simply  a  geometric  statement  of  the  primal  optimality  condition. 

Fact  2.6  (Optimality  condition  for  linear  inverse  problems).  The  vector  x0  is  the  unique  optimal  point  of  the 
convex  program  (2.5)  if  and  only  if  Q>  (/,  xq)  n  null(  A)  =  {0}. 

n  _ 

The  extended  real  numbers  R  :=  Ru  {±oo(.  A  proper  convex  function  has  at  least  one  finite  value,  and  it  does  not  take  the  value  -oo. 
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Figure  2.1:  Geometry  of  optimality  conditions,  [left]  The  optimality  condition  for  the  regularized  linear 
inverse  problem  (2.5)  states  that  the  descent  cone  of  /  at  xg  is  tangent  to  the  null  space  of  A.  [right]  The 
optimality  condition  for  the  convex  demixing  method  (2.9)  states  that  the  descent  cone  of  /  at  jco  has  a  trivial 
intersection  with  a  rotated  copy  of  the  descent  cone  of  g  at  yo. 


Figure  2.1  [left]  illustrates  the  geometry  of  this  optimality  condition.  Despite  its  simplicity,  this  result  forges  a 
crucial  link  between  the  convex  optimization  problem  (2.5)  and  the  theory  of  conic  integral  geometry. 

Our  goal  is  to  understand  the  power  of  convex  regularization  for  solving  linear  inverse  problems,  as  well  as 
the  limitations  inherent  in  this  approach.  To  do  so,  we  consider  the  case  where  the  measurements  are  generic. 
A  natural  modeling  technique  is  to  draw  the  measurement  matrix  A  at  random  from  the  standard  normal 
distribution  on  Rmxd.  For  this  model,  Theorem  I  allows  us  to  identify  a  sharp  transition  in  the  performance  of 
the  regularized  problem  (2.5). 

Theorem  II  (Phase  transitions  in  linear  inverse  problems) .  Fix  a  tolerance  rj  e  (0, 1).  Let  £  1 be  a  fixed 
vector.  Suppose  AeRmxd  has  independent  standard  normal  entries,  and  let  zq  =  Ax o.  Then 

m  >  8[3i{ftxo))  +  a7]'fd  =>  (2.5)  succeeds  with  probability  >  1  -  p; 

m  <  8{Q){f,  jco))  ~  Orpfd  =>  (2.5)  succeeds  with  probability  <  p. 

The  quantity  a v  :=  4^/\og[4lrf). 

Proof.  The  standard  normal  distribution  on  umxd  is  invariant  under  rotation,  so  the  null  space  L  =  null  (A) 
is  almost  surely  a  uniformly  random  ( d  -  m) -dimensional  subspace  of  IRd.  According  to  (2.2),  the  statistical 
dimension  8{L)  =  d-m  almost  surely.  The  result  follows  immediately  when  we  combine  the  optimality 
condition,  Fact  2.6,  and  the  kinematic  bound,  Theorem  I.  □ 

Theorem  II  proves  that  we  always  encounter  a  phase  transition  when  we  use  the  regularized  formula¬ 
tion  (2.5)  to  solve  the  linear  inverse  problem  (2.4)  with  random  measurements.  The  transition  occurs  where 
the  number  of  measurements  equals  the  statistical  dimension  of  the  descent  cone:  m  —  8[@>{f,x o)).  The  shift 
from  failure  to  success  takes  place  over  a  range  of  0(\/ri)  measurements. 

In  Section  4.5,  we  estimate  the  statistical  dimension  of  the  descent  cone  of  the  £\  norm  at  a  sparse  vector 
of  fixed  dimension.  Owing  to  symmetries  of  the  £\  norm,  only  the  sparsity  and  the  ambient  dimension  play  a 
role  in  this  calculation.  When  combined  with  Theorem  II,  this  result  yields  the  exact  (asymptotic)  location 
of  the  phase  transition  for  the  £\  minimization  problem  (2.6)  with  random  measurements.  In  Section  4.6, 
we  estimate  the  statistical  dimension  of  the  descent  cone  of  the  Schatten  1-norm  at  a  low-rank  matrix  with 
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Figure  2.2:  Phase  transitions  for  linear  inverse  problems,  [left]  Recovery  of  sparse  vectors.  The  empirical 
probability  that  the  £\  minimization  problem  (2.6)  identifies  a  sparse  vector  xo  e  R100  given  random  linear 
measurements  zq  =  Axq.  [right]  Recovery  of  low-rank  matrices.  The  empirical  probability  that  the  Si 
minimization  problem  (2.7)  identifies  a  low-rank  matrix  Xg  e  R30*30  given  random  linear  measurements 
zo  =  srf (X0).  In  each  panel,  the  colormap  indicates  the  empirical  probability  of  success  (black  =  0%;  white  = 
100%).  The  yellow  curve  marks  the  theoretical  prediction  of  the  phase  transition  from  Theorem  II;  the  red  curve 
traces  the  empirical  phase  transition. 


fixed  dimensions.  This  calculation  gives  the  exact  (asymptotic)  location  of  the  phase  transition  for  the  Si 
minimization  problem  (2.7)  with  random  measurements. 

To  underscore  these  achievements,  we  have  performed  some  computer  experiments  to  compare  the 
theoretical  and  empirical  phase  transitions.  Figure  2.2 [left]  shows  the  performance  of  (2.6)  for  identifying 
a  sparse  vector  in  IR100;  Figure  2.2  [right]  shows  the  performance  of  (2.7)  for  identifying  a  low-rank  matrix 
in  K30x30.  In  each  case,  the  colormap  indicates  the  empirical  probability  of  success  over  the  randomness  in 
the  measurement  operator.  The  empirical  5%,  50%,  and  95%  success  isoclines  are  determined  from  the 
data.  We  also  draft  the  theoretical  phase  transition  curve,  promised  by  Theorem  II,  where  the  number  m  of 
measurements  equals  the  statistical  dimension  of  the  appropriate  descent  cone,  which  we  compute  using  the 
formulas  from  Sections  4.5  and  4.6.  See  Appendix  A  for  the  experimental  protocol. 

In  both  examples,  the  theoretical  prediction  of  Theorem  II  coincides  almost  perfectly  with  the  50%  success 
isocline.  Furthermore,  the  phase  transition  takes  place  over  a  range  of  0(\/d)  values  of  m,  as  promised. 
Although  Theorem  II  does  not  explain  why  the  transition  region  tapers  at  the  bottom-left  and  top-right 
corners  of  each  plot,  we  have  established  a  more  detailed  version  of  Theorem  I  that  allows  us  to  predict  this 
phenomenon  as  well.  See  the  discussion  after  Theorem  7.1  for  more  information. 

2.4.  Demixing  problems.  In  a  demixing  problem  [MT12],  we  observe  a  superposition  of  two  structured 
vectors,  and  we  aim  to  extract  the  two  constituents  from  the  mixture.  More  precisely,  suppose  that  we  measure 
a  vector  zq  e  IRrf  of  the  form 

z0  =  x0  +  Uy0  (2.8) 

where  xo,yo  e  are  unknown  and  U  e  Udxd  is  a  known  orthogonal  matrix.  If  we  wish  to  identify  the  pair 
(x0,yo),  we  must  assume  that  each  component  is  structured  to  reduce  the  number  of  degrees  of  freedom. 
In  addition,  if  the  two  types  of  structure  are  coherent  (i.e.,  aligned  with  each  other),  it  maybe  impossible 
to  disentangle  them,  so  it  is  expedient  to  include  the  matrix  U  to  model  the  relative  orientation  of  the  two 
constituent  signals. 

To  solve  the  demixing  problem  (2.8),  we  describe  a  convex  programming  technique  proposed  in  [MT12]. 
Suppose  that  /  and  g  are  proper  convex  functions  on  Ud  that  promote  the  structures  we  expect  to  find  in  xq 
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and  y().  Then  we  can  frame  the  convex  optimization  problem 

minimize  fix)  subject  to  g(y)  <  g(y0)  and  z0-  x  +  Uy.  (2.9) 

In  other  words,  we  seek  structured  vectors  x  and  y  that  are  consistent  with  the  observation  zq.  This  approach 
requires  the  side  information  g(yo),  so  a  Lagrangian  formulation  is  sometimes  more  natural  in  practice.  Here 
are  two  concrete  examples  of  the  demixing  program  (2.9)  that  are  adapted  from  the  literature. 

Example  2.7  (Sparse  +  sparse).  Suppose  that  the  first  signal  xq  is  sparse  in  the  standard  basis,  and  the 
second  signal  Uyo  is  sparse  in  a  known  basis  U.  In  this  case,  we  can  use  f’\  norms  to  promote  sparsity,  which 
leads  to  the  optimization 

minimize  ||x|li  subject  to  llylh  <  llyolli  and  zo-x+Uy.  (2.10) 

This  approach  for  demixing  sparse  signals  is  sometimes  called  morphological  component  analysis  [SDC03, 
SED05,  ESQD05,  BMS06]. 

Example  2.8  (Low-rank  +  sparse).  Suppose  that  we  observe  Zq  -  Xo  +  f/(Yo)  where  Xo  is  a  low-rank  matrix, 
Yo  is  a  sparse  matrix,  and  ?/  is  a  known  orthogonal  transformation  on  the  space  of  matrices.  We  can  minimize 
the  Schatten  1-norm  to  promote  low  rank,  and  we  can  constrain  the  fi\  norm  to  promote  sparsity.  The 
optimization  becomes 

minimize  IIXIIjj  subject  to  ||  Fill  <  IIFolli  and  Zo  =  X  +  ty{Y).  (2.11) 

This  demixing  problem  is  called  rank-sparsity  decomposition  [CSPW11]. 

See  the  paper  [MT12]  for  some  additional  examples. 

We  say  that  the  convex  program  (2.9)  for  demixing  succeeds  when  it  has  a  unique  solution  (x,y)  that 
coincides  with  the  true  unknown:  (x,y)  =  (xo,  yo).  As  in  the  case  of  a  linear  inverse  problem,  we  can  express 
the  primal  optimality  condition  in  terms  of  descent  cones  [MT12,  Lem.  2.3]. 

Fact  2.9  (Optimality  condition  for  demixing).  The  pair  (x0,y0)  is  the  unique  optimal  point  of  the  convex 
program  (2.9)  if  and  only  if  @(/,x0)  n  (-If@(g,yo))  =  {0}. 

Figure  2.1  [right]  depicts  the  geometry  of  this  optimality  condition.  The  parallel  with  Fact  2.6,  the  optimality 
condition  for  linear  inverse  problems,  is  striking.  Indeed,  the  two  conditions  coalesce  when  the  function  g 
in  (2.9)  is  the  indicator  of  an  affine  space. 

Our  goal  is  to  understand  the  prospects  for  solving  the  demixing  problem  (2.8)  with  a  convex  program  of 
the  form  (2.9).  To  that  end,  we  use  randomness  to  model  the  favorable  case  where  the  two  structures  have  no 
interaction  with  each  other.  More  precisely,  we  draw  the  matrix  U  uniformly  at  random  from  the  orthogonal 
group.  Under  this  assumption,  Theorem  I  delivers  a  sharp  transition  in  the  performance  of  the  optimization 
problem  (2.9). 

Theorem  III  (Phase  transitions  in  demixing).  Fix  a  tolerance  p  e  (0,1).  Let  xq  and  yo  be  fixed  vectors  in 
Draw  an  orthogonal  matrix  U  e  IRdxd  uniformly  at  random,  and  let  zq-  xq  +  Uyo.  Then 

S[3i[f,xo))  +  d(@(g,yo))  <  d- av^d  =>  (2.9)  succeeds  with  probability  >  1  -  p; 

8{@i[f,xo))  +  d(@(g,yo))  >  d+  a^'fd  ==>  (2.9)  succeeds  with  probability  <  p. 

The  quantity  a v  4^/log{4lp). 

Proof  This  theorem  follows  immediately  when  we  combine  the  optimality  condition,  Fact  2.9,  with  the 
kinematic  bound,  Theorem  I.  We  invoke  the  rotational  invariance  (2.3)  of  the  statistical  dimension  to  simplify 
the  formulas.  □ 

Theorem  III  establishes  that  there  is  always  a  phase  transition  when  we  use  the  convex  program  (2.9) 
to  solve  the  demixing  problem  (2.8)  under  the  isotropic  random  model  for  U.  It  is  accurate  to  say  that  the 
optimization  is  effective  if  and  only  if  the  total  statistical  dimension  of  the  two  descent  cones  is  smaller  than 
the  ambient  dimension  d. 

Our  numerical  work  confirms  the  analysis  in  Theorem  III.  Figure  2.3  [left]  shows  when  (2.10)  can  demix  a 
sparse  vector  from  a  vector  that  is  sparse  in  a  random  basis  for  i 1 00 .  Figure  2.3  [right]  shows  when  (2.11) 
can  demix  a  low-rank  matrix  from  a  matrix  that  is  sparse  in  a  random  basis  for  R35x35.  In  each  case,  the 
experiment  provides  an  empirical  estimate  for  the  probability  of  success  with  respect  to  the  randomness  in 
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Figure  2.3:  Phase  transitions  for  demixing  problems,  [left]  Sparse  +  sparse.  The  empirical  probability 
that  the  convex  program  (2.10)  successfully  demixes  a  vector  jto  e  R100  that  is  sparse  in  the  standard  basis  from 
a  vector  Uyo  e  R100  that  is  sparse  in  the  random  basis  U.  [right]  Low  rank  +  sparse.  The  empirical  probability 
that  the  convex  program  (2.11)  successfully  demixes  a  low-rank  matrix  Xo  e  r35x35  from  a  matrix  Ur (Yo)  e  R35x35 
that  is  sparse  in  the  random  basis  Ur.  In  each  panel,  the  colormap  indicates  the  empirical  probability  of  success 
(black  =  0%;  white  =  100%).  The  yellow  curve  marks  the  theoretical  phase  transition  predicted  by  Theorem  III. 
The  red  curve  follows  the  empirical  phase  transition. 


the  measurement  operator.  We  display  the  5%,  50%,  and  95%  success  isoclines,  determined  from  the  data. 
We  also  sketch  the  theoretical  phase  transition  from  Theorem  III,  which  occurs  when  the  total  statistical 
dimension  of  the  relevant  cones  equals  the  ambient  dimension.  The  statistical  dimension  of  the  descent  cones 
are  obtained  from  the  formulas  in  Sections  4.5  and  4.6.  See  [MT12,  Sec.  6]  for  the  details  of  the  experimental 
protocol. 

Once  again,  we  see  that  the  theoretical  curve  of  Theorem  III  coincides  almost  perfectly  with  the  empirical 
50%  success  isocline.  The  width  of  the  transition  region  is  0{\fd).  Theorem  III  does  not  predict  the  tapering 
of  the  transition  in  the  top-left  and  bottom-right  corners,  but  the  discussion  after  Theorem  7. 1  exposes  the 
underlying  reason  for  this  phenomenon. 

3.  Notation,  conventions,  and  background 

This  section  contains  a  short  overview  of  our  notation,  as  well  as  some  important  facts  from  convex 
geometry  that  we  will  use  liberally.  Some  standard  references  include  Rockafellar  [Roc70],  Hiriart-Urruty  & 
Lemarechal  [HUL93a,  HUL93b],  and  Rockafellar  &  Wets  [RW98]. 

3.1.  Vectors  and  matrices.  We  use  boldface  lowercase  letters  to  denote  vectors  and  boldface  capital  letters 
to  denote  matrices,  so  x  is  a  vector  and  X  is  a  matrix.  We  write  I  for  the  identity  matrix  and  0  for  the  zero 
vector  or  matrix;  their  dimensions  are  determined  by  context. 

3.2.  Euclidean  geometry.  For  vectors  x,y  e  define  the  Euclidean  inner  product  (x,  y )  and  the  squared 

Euclidean  norm  ||x||2  :=  {x,  x).  The  Euclidean  unit  ball  and  unit  sphere  are  respectively  denoted  as 

Brf  :=  {x  e  :  ||jc||  <  1}  and  Sd~l  :=  {xe  :  ||x||  =  1}. 

The  group  of  d  x  d  orthogonal  matrices  is  defined  as 

Od:={UeUdxd  UUT  =  1}. 

An  orthogonal  basis  for  is  an  element  of  Od. 
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3.3.  Convex  cones.  Recall  that  a  convex  cone  C  is  a  set  that  is  convex  and  positive  homogeneous:  C  =  rC 
for  all  t  >  0.  A  convex  cone  is  polyhedral  if  it  can  be  written  as  the  intersection  of  a  finite  number  of  closed 
halfspaces.  We  introduce  the  family  ^  of  all  nonempty,  closed,  convex  cones  in 

For  a  general  convex  cone  C  c  Rd,  the  polar  cone  C°  e  is  the  closed  convex  cone 

C° {ueRd :  ( u ,  x)  <  0  for  all  reC}. 

The  normal  cone  x)  of  a  convex  set  S  c  [Rd  at  a  point  x  e  S  consists  of  the  outward  normals  to  all 
hyperplanes  that  support  S  at  x,  i.e., 

«yV(S,Jc)  :=  {it  e  IRd  :  (it,  y-x)  <  0  forallyeS}.  (3.1) 

3.4.  Representation  of  descent  cones.  Let  f  :Ud  —  Ube  a  proper  convex  function.  Recall  that  the  descent 
cone  of  /  at  a  point  x  is  given  by 

@{f,x):={J{y£Rd:f{x  +  Ty)<fW}. 

T>0 

The  polar  of  a  descent  cone  has  some  attractive  properties  that  we  will  exploit  later.  First,  the  polar  of  a 
descent  cone  coincides  with  the  normal  cone  of  a  sublevel  set: 

Q>lf,x)°  =JV(S,  x)  where  S- {y  £  Ud  :  f(y)  <  fix)}.  (3.2) 

Next,  we  introduce  the  subdifferential  d/(x),  which  is  the  closed  convex  set 

df(x) {ueUd  :  /(y)  >  f[x)  +  (it,  y-  x)  for  all  y  e  IRrf}. 

In  particular,  the  subdifferential  contains  the  origin  0  if  and  only  if  x  minimizes  /.  Assuming  that  the 
subdifferential  df{x)  is  nonempty,  compact,  and  does  not  contain  the  origin,  the  result  [Roc70,  Cor.  23.7.1] 
provides  that 

=  cone(d/(x))  :=  [J  r-df(x).  (3.3) 

T>0 

The  expression  r  -df[x)  represents  dilation  of  the  subdifferential  by  a  factor  t.  The  relation  (3.3)  offers  a 
powerful  tool  for  computing  the  statistical  dimension  of  a  descent  cone.  Related  identities  hold  under  weaker 
technical  conditions  [Roc70,  Thm.  23.7]. 

3.5.  Euclidean  projections  onto  sets.  Let  S  c  Ud  be  a  closed  convex  set.  The  Euclidean  distance  to  the  set  S 
is  the  function 

dist(-,S) :  IRrf  — ■  IR+  where  distfjc,  S) inf{  ||  jc  —  y || :  y  £  S}. 

The  Euclidean  projection  onto  the  set  S  is  the  map 

;7Ts:[Rd^S  where  Ttsix)  :=  argmin{  ||jc- y|| :  y  e  S}. 

The  projection  takes  a  well-defined  value  because  the  norm  is  strictly  convex.  Let  us  note  some  properties 
of  these  maps.  First,  the  function  dist(-,S)  is  convex  [Roc70,  p.  34].  Next,  the  maps  7is  and  I  -ns  are 
nonexpansive  with  respect  to  the  Euclidean  norm  [Roc70,  Thm.  31.5  et  seq.]: 

||jrs(x)-jrs(y)||  <  Hx-yll  and  ||(I-jrs)(x)  -  (I-zrs)(y)||  <  ||x- y||  for  all  x,y  e  (3.4) 

As  a  consequence,  the  projection  tis  is  continuous,  and  the  distance  function  is  1-Lipschitz  with  respect  to  the 
Euclidean  norm: 

| dist(jc,  S)  -  distfy,  S)  |  <  ||  jc  —  y||  for  all  x,y  e  Ud.  (3.5) 

The  squared  distance  is  differentiable  everywhere,  and  the  derivative  satisfies 

Vdist2(x,S)  =  2(jc-zrs(jr))  for  all  x £  IRrf.  (3.6) 

This  point  follows  from  [RW98,  Thm.  2.26]. 
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3.6.  Euclidean  projections  onto  cones.  Let  C  e  ^  be  a  closed  convex  cone.  Recall  that  the  Euclidean 
projection  onto  the  cone  C  is  the  map 

IIc:IRd^C  where  nc(x)  :=  argmin{  ||jc-j/||  :  y  £  C}. 

We  have  used  a  separate  notation  for  the  projection  onto  a  set  because  the  projection  onto  a  cone  enjoys 
a  number  of  additional  properties  [HUL93a,  Sec.  III. 3. 2].  First,  the  projection  onto  a  cone  is  nonnegative 
homogeneous:  IIc(t;c)  =  r  Ilfox)  for  all  r  >  0.  Next,  the  cone  C  induces  an  orthogonal  decomposition  of  Rd . 


jc=nc(x)  +  nc°(jc)  and  (IIcM,  IIc°(x)}  =  0  for  all  x  e  (3.7) 

The  decomposition  (3.7)  yields  the  Pythagorean  identity 

IIjcII2  =  ||nc(jc)||2  +  ||nCo(jc)||2  for  all  x  e  K!d.  (3.8) 

It  also  implies  the  distance  formulas 

dist(jc,C)  =  ||  jc  —  IIc(jc)  ||  =  ||  nCo  [jc)  ||  forallx£[Rrf.  (3.9) 

The  squared  norm  of  the  projection  has  a  nice  regularity  property,  which  follows  from  a  short  argument  based 
on  (3.6),  (3.7),  and  (3.9): 

V  ||nc(jr)H2  =  2IIc(je)  for  all  x  e  IRd.  (3.10) 

Finally,  the  projection  map  decomposes  under  Cartesian  products.  For  two  cones  Ci  £  and  C2  e  Tifo  the 
product  Ci  x  C2  e  c&d1+d2>  and 

nClXc2((JCi^2))  =  (nCl(xi),nC2M  for  allxi£Rdl  andx2elRd2.  (3.11) 

The  relation  (3.11)  is  easy  to  check  directly. 


3.7.  Probability.  The  symbol  P{-}  denotes  the  probability  of  an  event,  and  E[-]  returns  the  expectation  of 
a  random  variable.  We  reserve  the  letter  g  for  a  standard  normal  vector,  i.e.,  a  vector  whose  entries  are 
independent  normal  variables  with  mean  zero  and  variance  one.  We  reserve  the  letter  0  for  a  random  vector 
uniformly  distributed  on  the  Euclidean  unit  sphere.  The  set  0 d  of  orthogonal  matrices  is  a  compact  Lie  group, 
so  it  admits  an  invariant  Haar  (i.e.,  uniform)  probability  measure.  We  reserve  the  letter  Q  for  a  uniformly 
random  element  of  Orf,  and  we  refer  to  Q  as  a  random  orthogonal  basis. 

4.  Calculating  the  statistical  dimension 

Section  2  demonstrates  that  we  can  pinpoint  the  phase  transitions  in  random  linear  inverse  problems 
and  random  demixing  problems  as  soon  as  we  know  the  statistical  dimension  of  the  appropriate  descent 
cones.  The  goal  of  this  section  is  to  show  that  we  can  obtain  highly  accurate  approximation  formulas  for  the 
statistical  dimension  of  a  cone  with  a  modest  amount  of  effort. 

Section  4.1  develops  some  basic  properties  of  the  statistical  dimension  that  aid  this  investigation.  The 
rest  of  the  section  explains  how  to  compute  the  statistical  dimension  for  several  families  of  cones  that 
arise  in  applications.  This  discussion  includes  a  recipe  for  bounding  the  statistical  dimension  of  a  descent 
cone;  Theorem  4.5  ensures  that  the  recipe  produces  an  accurate  answer  for  many  problems  of  interest.  We 
summarize  our  calculations  in  Table  4.1  and  in  Figure  4.1. 

4.1.  Basic  facts  about  the  statistical  dimension.  The  statistical  dimension  has  a  number  of  valuable 
properties  that  are  readily  apparent  from  Definition  2.1.  These  facts  provide  useful  tools  for  making 
computations,  and  they  strengthen  the  analogy  between  the  statistical  dimension  of  a  cone  and  the  linear 
dimension  of  a  subspace. 

Proposition  4.1  (Properties  of  statistical  dimension).  Let  C  be  a  closed  convex  cone.  The  statistical 
dimension  obeys  the  following  laws. 

(1)  Gaussian  formulation.  The  statistical  dimension  is  defined  as 

<5(C):=E[||nc(g)ll2]  where  g  ~  normal(O.I^).  (4.1) 

(2)  Spherical  formulation.  An  equivalent  definition  states  that 

S{C )  :=  d  E  [  ||IIc(0)ll2  ]  where  0  ~  UNiFORM(Srf_1). 


(4.2) 
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(3)  Rotational  invariance.  The  statistical  dimension  does  not  depend  on  the  orientation  of  the  cone: 

8 (UC)  =  8(C)  for  each  U  e  Od.  (4.3) 

(4)  Subspaces.  For  a  subspace  L  c  [Rd,  the  statistical  dimension  satisfies  8(L )  =  dim(L). 

(5)  Polarity.  The  statistical  dimension  can  also  be  expressed  in  terms  of  the  polar  cone: 

8(C)  =  E[dist2(g,  C°)].  (4.4) 

(6)  Totality.  The  total  statistical  dimension  of  a  cone  and  its  polar  equals  the  ambient  dimension: 

8(C)  +  8  (C°)  =  d.  (4.5) 

This  generalizes  the  property  dim(L)  +  dimfL1)  =  dfor  each  subspace  L  c  IRd. 

(7)  Direct  products.  For  each  cone  K  e 

8(C*K)  =  8(C)  +  8(K).  (4.6) 

In  particular,  the  statistical  dimension  is  invariant  under  embedding: 

<5(Cx{Od4)  =  <5(C). 

The  relation  (4.6)  generalizes  the  rule  dim(L  x  M)  -  dim(L)  +  dim(M)/or  linear  subspaces  L  and  M. 

(8)  Monotonicity.  For  each  cone  K  e  r£,i,  the  inclusion  C  c  K  implies  that  8(C)  <8(K). 

Proof  The  Gaussian  formulation  in  (4.1)  simply  repeats  Definition  2.1.  To  derive  the  spherical  formula¬ 
tion  (4.2)  from  (4.1),  we  introduce  the  spherical  decomposition  g  =  RO,  where  R  ||g||  is  a  chi  random 
variable  with  d  degrees  of  freedom  that  is  independent  from  the  spherical  variable  0.  Use  nonnegative 
homogeneity  to  draw  the  term  R  out  of  the  projection  and  the  squared  norm,  factor  the  expectation  via 
independence,  and  note  that  E[f?2]  =  d. 

The  rotational  invariance  property  (4.3)  follows  immediately  from  the  fact  that  a  standard  normal  vector  is 
rotationally  invariant. 

To  compute  the  statistical  dimension  of  a  subspace,  note  that  the  Euclidean  projection  of  a  standard  normal 
vector  onto  a  subspace  has  the  standard  normal  distribution  supported  on  that  subspace,  so  its  expected 
squared  norm  equals  the  dimension  of  the  subspace. 

The  polar  identity  (4.4)  is  a  direct  consequence  of  the  distance  formula  (3.9),  which  implies  that  ||IIc(g)ll  = 
dist(g,  C°).  Similarly,  the  totality  law  (4.5)  follows  from  the  Pythagorean  identity  (3.8). 

We  obtain  the  direct  product  rule  (4.6)  from  the  observation  (3.11)  that  projection  splits  over  a  direct 
product,  coupled  with  the  fact  that  projecting  a  standard  normal  vector  onto  each  of  two  orthogonal  subspaces 
results  in  two  independent  standard  normal  vectors. 

Finally,  we  verify  the  monotonicity  law.  Polarity  reverses  inclusion,  so  K°  c  C°.  Using  the  polarity 
identity  (4.4)  twice,  we  obtain 

8(C)  =  E[dist2(g,  C°)]  <  E  [  dist2 (g,  i<r0) ]  =  8(K). 

This  completes  the  recitation.  □ 

The  statistical  dimension  also  enjoys  some  deeper  properties.  We  reserve  these  results  until  Section  5.3, 
which  gives  us  time  to  introduce  additional  tools  from  integral  geometry. 

4.2.  Self-dual  cones.  We  say  that  a  cone  C  is  self-dual  when  C°  =  -C.  Self-dual  cones  are  ubiquitous  in  the 
theory  and  practice  of  convex  optimization.  Here  are  three  important  examples: 

(1)  The  nonnegative  orthant.  The  cone  R'l jx  e  Rd  :  xt  >  0  for  i  =  1  ,...,d}  is  self-dual. 

(2)  The  second-order  cone.  The  cone  Ld+1  :=  {(x,r)  e  IRd+1  :  ||x||  <  t}  is  self-dual.  This  example  is 
sometimes  called  the  Lorentz  cone  or  the  ice-cream  cone. 

(3)  Symmetric  positive-semidefinite  matrices.  The  cone  S"x"  :=  \X  e  :  X  0}  is  self-dual,  where 
the  curly  inequality  denotes  the  semidefinite  order.  Note  that  the  linear  space  of  n  x  n  symmetric 
matrices  has  dimension  \n(n+  1). 

For  a  self-dual  cone,  the  computation  of  the  statistical  dimension  is  particularly  simple;  cf.  [CRPW12,  Cor.  3.8]. 
The  first  three  entries  in  Table  4.1  follow  instantly  from  this  result. 
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Table  4.1:  The  statistical  dimensions  of  some  convex  cones. 


Cone 

Notation 

Statistical  dimension 

Location 

The  nonnegative  orthant 

\d 

Sec.  4.2 

The  second-order  cone 

1 

\{d+Y) 

Sec.  4.2 

Symmetric  positive- 
semidefinite  matrices 

gnxn 

\n{n+  1) 

Sec.  4.2 

Circular  cone  in  IRd  of  angle  a 

Circrf(a) 

dsin2(a)  +  0(1) 

Sec.  4.3 

Chambers  of  finite  reflection 
groups  acting  on  Kd 

Ca 

Cbc 

log(d)  +  0(1) 
|log(d)  +  0(1) 

Sec.  4.7 

Proposition  4.2  (Self-dual  cones).  Let  Cec€dbe  a  self-dual  cone.  The  statistical  dimension  5(C)  =  \ d . 

Proof.  Just  observe  that  5(C)  =  |[5(C)  +  5(C°)]  =  \d.  The  first  identity  holds  because  of  the  self-dual  property 
of  the  cone  and  the  rotational  invariance  (4.3)  of  statistical  dimension.  The  second  equality  follows  from  the 
totality  law  (4.5).  □ 

4.3.  Circular  cones.  The  circular  cone  Circ^la)  in  P,J  with  angle  0  <  a  <  |  is  defined  as 

Circd(a)  :=  {jtelrf:ii>  ||x||  cos(a)}. 

In  particular,  the  cone  Circrf(|)  is  isometric  to  the  second-order  cone  Ld.  Circular  cones  have  numerous 
applications  in  optimization;  we  refer  the  reader  to  [BV04,  Sec.  4],  [BTN01,  Sec.  3],  and  [AG03]  for  details. 

We  can  obtain  an  accurate  expression  for  the  statistical  dimension  of  a  circular  cone  using  trigonometry 
and  basic  asymptotic  methods. 

Proposition  4.3  (Circular  cones).  The  statistical  dimension  of  a  circular  cone  satisfies 

5(CirCd(a))  =  dsin2(a)  +  0(1).  (4.7) 

The  error  term  is  approximately  equal  to  cos(2 a).  See  Figure  4.1  [left]  for  a  plot  of  (4.7). 

Turn  to  Appendix  C.l  for  the  proof,  which  seems  to  be  original.  Even  though  the  formula  in  Proposition  4.3  is 
simple,  it  already  gives  an  accurate  approximation  in  moderate  dimensions. 


4.4.  A  recipe  for  the  statistical  dimension  of  a  descent  cone.  Theorems  II  and  III  allow  us  to  locate  the 
phase  transition  in  a  regularized  inverse  problem  with  random  data.  To  apply  these  results,  we  must  be  able 
to  compute  the  statistical  dimension  of  a  descent  cone  associated  with  the  regularizer.  In  this  section,  we 
describe  a  method  that  delivers  a  superb  upper  bound  for  the  statistical  dimension  of  a  descent  cone.  This 
technique  is  based  on  an  elegant  application  of  polarity.  Stojnic  [Sto09]  developed  the  basic  argument,  which 
Chandrasekaran  et  al.  [CRPW12]  subsequently  refined.  We  have  undertaken  some  extra  technical  work  to 
prove  that  the  upper  bound  is  accurate  for  the  most  important  examples. 

Proposition  4.4  (The  statistical  dimension  of  a  descent  cone).  Let  f  be  a  proper  convex  function.  Assume  that 
the  sub  differential  df{  x)  is  nonempty,  compact,  and  does  not  contain  the  origin.  Then 

5(@(/,x))  <  inf  E[dist2(g,r-5/(x))].  (4.8) 

The  function  on  the  right-hand  side  of  (4.8),  namely 

F :  r  >-►  E  [dist2  (g,  r  •  d/(x))]  for  r  >  0,  (4.9) 

is  strictly  convex,  continuous  at  t  =  0,  and  differentiable  for  t  >  0.  It  achieves  its  minimum  at  a  unique  point. 


Proof.  We  use  the  polarity  relation  (4.4)  to  compute  the  statistical  dimension: 


8(@>(f,x))  =  E[dist2  [g,3>{ftx)0)]  -  E 


dist2  ( g,  U  r  •  d  f[x) 

T>0 


=  E  inf  dist2  (g,  t  ■  d/(x)) . 
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Circular  cones  Descent  cone  of  the  norm  Descent  cones  of  the  Schatten  1-norm 


Figure  4.1:  Asymptotic  statistical  dimension  computations.  In  each  panel,  we  take  the  dimensional  param¬ 
eters  to  infinity,  [left]  Circular  cones.  The  plot  shows  the  normalized  statistical  dimension  8[-)ld  of  the  circular 
cone  Circ d(a).  [center]  f  \  descent  cones.  The  curve  traces  the  normalized  statistical  dimension  8{-)ld  of  the 
descent  cone  of  the  f  \  norm  on  R‘l  at  a  vector  with  \pd\  nonzero  entries,  [right]  Schatten  1-norm  descent 
cones.  The  normalized  statistical  dimension  8[-)l[mn )  of  the  descent  cone  of  the  Si  norm  on  Rmx”  at  a  matrix 
with  rank  [pm]  for  several  fixed  aspect  ratios  v  =  min.  As  the  aspect  ratio  v— ►  0,  the  limiting  curve  is  p  —  2p- p2. 


The  second  identity  follows  from  the  fact  (3.3)  that,  under  our  technical  assumptions,  the  polar  of  the  descent 
cone  is  the  cone  generated  by  the  subdifferential.  The  third  identity  holds  because  the  distance  to  a  union  is 
the  infimal  distance  to  any  one  of  its  members.  To  reach  (4.8),  we  apply  Jensen’s  inequality.  See  Lemma  B.2 
in  Appendix  B.l  for  the  proof  of  the  remaining  claims.  □ 

Proposition  4.4  suggests  a  method,  displayed  as  Recipe  4.1,  for  bounding  the  statistical  dimension  of  a 
descent  cone.  In  the  next  two  sections,  we  use  this  approach  to  estimate  the  statistical  dimension  of  the 
descent  cone  of  the  t\  norm  at  a  sparse  vector  and  the  Schatten  1-norm  at  a  low-rank  matrix. 

The  literature  contains  empirical  evidence  that  Recipe  4. 1  leads  to  very  accurate  upper  bounds.  We  have 
obtained  an  error  estimate  which  shows  that  the  recipe  is  precise  for  many  important  examples. 

Theorem  4.5  (Error  bound  for  descent  cone  recipe).  Let  f  be  a  norm  on  IRd,  and  fix  a  nonzero  point  x.  Then 

I  ,  .  |  2  sup{  II  .s' II :  s  £  df[x)\ 

|f(9c/,x))-jnfrc„|s  P1/W|W|/  *.  (4.10) 

where  the  function  F  is  defined  in  (4.9). 

The  proof  of  Theorem  4.5  is  technical  in  nature,  so  we  defer  the  details  to  Appendix  B.2.  The  application  of 
this  result  requires  some  care  because  many  different  vectors  x  can  generate  the  same  subdifferential  d  f(x) 
and  hence  the  same  descent  cone  @i{f,x).  From  this  class  of  vectors,  we  ought  to  select  one  that  maximizes 
the  value  fix/  ||jc||). 

Remark  4.6  (Improved  error  bounds).  We  have  established  several  variants  of  Theorem  4.5.  For  example, 
we  can  study  arbitrary  convex  functions  instead  of  norms  if  we  move  to  an  appropriate  asymptotic  setting. 
Unfortunately,  we  have  not  identified  the  optimal  form  for  the  error  bound  (4.10)  in  the  general  case.  As 
such,  we  have  chosen  to  present  a  simple  result  that  justifies  our  analysis  of  phase  transitions  in  (  \  and  Si 
minimization  problems. 

4.5.  Descent  cones  of  the  £\  norm.  When  we  wish  to  solve  an  inverse  problem  with  a  sparse  unknown, 
we  often  use  the  £\  norm  as  a  regularizer;  cf.  (2.6),  (2.10),  and  (2.11).  Our  next  result  summarizes  the 
calculations  required  to  obtain  the  statistical  dimension  of  the  descent  cone  of  the  t  \  norm  at  a  sparse  vector. 
When  we  combine  this  proposition  with  Theorems  II  and  III,  we  obtain  the  exact  location  of  the  phase 
transition  for  (\  regularized  inverse  problems  whose  dimension  is  large. 
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Recipe  4.1 :  The  statistical  dimension  of  a  descent  cone. 


Assume  that  /  is  a  proper  convex  function  on  Kd 

Assume  that  the  subdifferential  df{  x)  is  nonempty,  compact,  and  does  not  contain  the  origin 

(1)  Identify  the  subdifferential  S  =  df(x). 

(2)  For  each  r  >  0,  compute  F(x)  =  E[dist2(g,xS)]. 

(3)  Find  the  unique  solution,  if  it  exists,  to  the  stationary  equation  F'(r)  -  0. 

(4)  If  the  stationary  equation  has  a  solution  x*,  then  <5(0(/,  jc))  <  F(x*). 

(5)  Otherwise,  the  bound  is  vacuous:  8(3>[f,x))  <  F{ 0)  =  d. 


Proposition  4.7  (Descent  cones  of  the  L\  norm).  Let  x  be  a  vector  in  IRrf  with  s  nonzero  entries.  Then  the 
normalized  statistical  dimension  of  the  descent  cone  of  the  C\  norm  at  x  satisfies  the  bounds 


i fj{sld)  - 

The  function  y/ :  [0, 1]  —  [0, 1]  is  defined  as 


sfsd 


<  y/[sld). 


r>0 


y/(p)  :=  inf  \  p(l  +  xz)  +  (1  -  p)\  - 


(1  +  r2) 


Jr»oo 


2/2du-xe-T2/2 


The  infimum  in  (4.12)  is  achieved  for  the  unique  positive  x  that  solves  the  stationary  equation 


V2'2-  r 

T  JT 


-uzl  2 


d  u  -  \  ■ 


2  1-p 


See  Figure  4. 1  [center]  for  a  plot  of  the  function  (4.12). 


(4.11) 


(4.12) 


(4.13) 


Proposition  4.7  is  a  direct  consequence  of  Recipe  4.1  and  the  error  bound  in  Theorem  4.5.  See  Appendix  C.2 
for  details  of  the  proof;  Appendix  A.  2  explains  the  numerical  aspects. 

Let  us  emphasize  the  following  consequences  of  Proposition  4.7.  When  the  number  s  of  nonzeros  in  the 
vector  x  is  proportional  to  the  ambient  dimension  d,  the  error  in  the  statistical  dimension  calculation  (4.11) 
is  vanishingly  small  relative  to  the  ambient  dimension.  When  x  is  sparser,  it  is  more  appropriate  to  compare 
the  error  with  the  statistical  dimension  itself.  Thus, 


<5(0(IHIi,x))  -d-y/(sld) 
<5(0(11 -Hi,*)) 


2 

<  — 

^(©(ll-lli,  x)) 


when  s>  \fd+  1. 


We  have  used  the  observation  that  5(0(||-|li  ,*))  >  s-  1,  which  holds  because  0(||-|li,x)  contains  the  (s-  11- 
dimensional  subspace  parallel  with  the  face  of  the  f:  \  ball  containing  x. 

Aside  from  the  first  inequality  in  (4.11),  the  calculations  and  the  resulting  formulae  in  Proposition  4.7  are 
not  substantially  novel.  Most  of  the  existing  analysis  concerns  the  phase  transition  in  compressed  sensing, 
i.e.,  the  f:  \  minimization  problem  (2.6)  with  Gaussian  measurements.  In  this  setting,  Donoho  [Don06b]  and 
Donoho  &  Tanner  [DT09a]  obtained  an  asymptotic  upper  bound,  equivalent  to  the  upper  bound  in  (4.11), 
from  polytope  angle  calculations.  Stojnic  [Sto09]  established  the  same  asymptotic  upper  bound  using  a 
precursor  of  Recipe  4.1;  see  also  Chandrasekaran  et  al.  [CRPW12,  App.  C].  In  addition,  there  are  some 
heuristic  arguments,  based  on  ideas  from  statistical  physics,  that  lead  to  the  same  result,  cf.  [DMM09a] 
and  [DMM09b,  Sec.  17].  Very  recently,  Bayati  et  al.  [BLM12]  have  shown  that,  in  the  asymptotic  setting,  the 
compressed  sensing  problem  undergoes  a  phase  transition  at  the  location  predicted  by  (4.11). 


4.6.  Descent  cones  of  the  Schatten  1-norm.  When  we  wish  to  solve  an  inverse  problem  whose  unknown  is 
a  low-rank  matrix,  we  often  use  the  Schatten  1-norm  as  a  regularize^  as  in  (2.7)  and  (2.11).  The  following 
result  gives  a  sharp  asymptotic  expression  for  the  statistical  dimension  of  the  descent  cone  of  the  Si  norm  at  a 
low-rank  matrix.  Together  with  Theorems  II  and  III,  this  proposition  allows  us  to  identify  the  exact  location  of 
the  phase  transition  for  Si  regularized  inverse  problems  whose  dimension  is  large. 
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Proposition  4.8  (Descent  cones  of  the  Si  norm).  Consider  a  sequence  {X(r,  m,  n)\  of  matrices  where  X(r,  m,  n ) 
has  rank  r  and  dimension  m  x  n  with  m<n.  Suppose  that  r,  m,n—>  oo  with  limiting  ratios  rim  —  p  e  (0, 1)  and 
min  — ►  v  e  (0, 1].  Then 


mn 


y/{p,v). 


(4.14) 


The  function  \jj :  [0, 1]  x  [0, 1]  —  [0, 1]  is  defined  as 


i frlp,  v ) inf  pv  +  (1  -  pv) 


p(l  +  t2)  +  (1  -  p)  f  (u  —  t)2  *  (py[u)  du  1. 

Ja-VT  J 


(4.15) 


The  quantity  y:=  (v  -  pv)/(l  -  pv),  and  the  limits  of  the  integral  are  a+  :=  1  ±  fry.  The  integral  kernel  <py  is  a 
probability  density  supported  on  [a-,a+]: 

(pv(u):= - J {u2  -  a2:) {of  -  u2)  for  ue[ci-,a+]. 

nyu  v 

The  optimal  value  of  r  in  (4.15)  satisfies  the  stationaiy  equation 


ra+  /  u 

-l) 

Iu-VT'T 

1 

)  d  u  = 


1-p 


(4.16) 


See  Figure  4.1  [right]  for  a  visualization  of  the  curve  (4.15)  as  function  of  p  for  several  choices  of  v.  The  operator 
v  returns  the  maximum  of  two  numbers. 


See  Appendix  C.3  for  a  proof  of  Proposition  4.8.  Appendix  A.2  contains  details  of  the  numerical  calculation. 

The  literature  contains  several  papers  that,  in  effect,  contain  loose  upper  bounds  for  the  statistical  dimension 
of  the  descent  cones  of  the  Schatten  1-norm  [RXH11,  OKH11],  We  single  out  the  work  [OHIO]  of  Oymak 
&  Hassibi,  which  identifies  an  empirically  sharp  upper  bound  via  a  laborious  argument.  The  approach  here 
is  more  in  the  spirit  of  the  upper  bound  in  [CRPW12,  App.  C],  but  we  have  taken  extra  care  to  obtain  the 
asymptotically  correct  estimate. 


4.7.  Normal  cones  to  permutahedra.  We  close  this  section  with  a  more  sophisticated  example.  The  (signed) 
permutahedron  generated  by  a  vector  x  e  IRf/  is  the  convex  hull  of  all  (signed)  coordinated  permutations  of  the 
vector: 

S?{x)  :=  conv{cr(x) :  a  a  coordinate  permutation}  (4.17) 

@>±(x)  conv{cr+(x) :  a+  a  signed  coordinate  permutation}.  (4.18) 

(A  signed  permutation  permutes  the  coordinates  of  a  vector  and  gives  each  one  an  arbitrary  sign.)  Figure  4.2 
displays  two  signed  permutahedra  and  the  normal  cone  at  a  vertex. 

In  this  section,  we  present  an  exact  formula  for  the  statistical  dimension  of  the  normal  cone  of  a  permuta¬ 
hedron.  In  Section  9,  we  use  this  calculation  to  study  an  application  in  signal  processing  that  was  proposed 
in  [CRPW12,  p.  812], 

Proposition  4.9  (Normal  cones  of  permutahedra).  Suppose  that  x  has  distinct  entries.  The  statistical  dimension 
of  the  normal  cone  at  a  vertex  of  the  ( signed )  permutahedron  generated  by  x  satisfies 

S[JT(&>(x),x))  -  Hd  and  3^(&>±(x),x))  =  \Wd, 

where  :=  Yfj=]  i~l  is  the  dth  harmonic  number. 

The  proof  of  Proposition  4.9  appears  in  Appendix  C.4.  The  argument  illustrates  some  deep  connections 
between  conic  geometry  and  classical  combinatorics. 


5.  Tools  from  conic  integral  geometry 

To  prove  that  the  statistical  dimension  controls  the  location  of  phase  transitions  in  random  convex 
optimization  problems,  we  rely  on  methods  from  conic  integral  geometiy,  the  field  of  mathematics  concerned 
with  geometric  properties  of  convex  cones  that  remain  invariant  under  rotations  and  reflections.  Here  are 
some  of  the  guiding  questions  in  this  area: 

•  What  is  the  probability  that  a  random  unit  vector  lies  at  most  a  specified  distance  from  a  fixed  cone? 

•  What  is  the  probability  that  a  randomly  rotated  cone  shares  a  ray  with  a  fixed  cone? 
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Figure  4.2:  Normal  cone  at  the  vertex  of  a  permutahedron.  The  signed  permutahedron  8?±(x)  generated 
by  [left]  the  vector  x  =  (3,-1)  and  [right]  the  vector  x  =  (3,  -2).  In  each  panel,  the  darker  cone  is  the  normal 
cone  Jf(&±{x),x),  and  the  lighter  cone  is  its  polar.  Note  that  the  normal  cone  does  not  depend  on  the  generator 
x  provided  that  the  entries  of  x  are  distinct. 


The  theory  of  conic  integral  geometry  offers  beautiful  and  precise  answers,  phrased  in  terms  of  a  set  of 
geometric  invariants  called  conic  intrinsic  volumes. 

In  the  next  section,  we  introduce  the  intrinsic  volumes  of  a  cone,  we  compute  the  intrinsic  volumes  of 
some  basic  cones,  and  we  state  the  key  facts  about  intrinsic  volumes.  Section  5.2  contains  more  advanced 
formulas  from  conic  integral  geometry,  which  are  essential  tools  for  identifying  phase  transitions.  We  revisit 
the  statistical  dimension  in  Section  5.3,  where  we  develop  some  elegant  new  characterizations. 

The  material  in  this  section  is  adapted  from  the  book  [SW08,  Sec.  6.5]  and  the  dissertation  [Amell].  The 
foundational  research  in  this  area  is  due  to  Santalo  [San76,  Part  IV].  Modern  treatments  depend  on  the  work 
of  Glasauer  [Gla95,  Gla96].  In  these  sources,  the  theory  is  presented  in  terms  of  spherical  geometry,  rather 
than  in  terms  of  conical  geometry.  As  noted  in  [AB12],  the  two  approaches  are  equivalent,  but  the  conic 
viewpoint  provides  simpler  formulas  and  has  key  benefits  that  are  only  revealed  through  deeper  structural 
investigations. 

5.1.  Conic  intrinsic  volumes.  We  begin  with  the  definition  of  the  intrinsic  volumes  of  a  convex  cone. 

Definition  5. 1  (Intrinsic  volumes) .  Let  CE^bea  polyhedral  cone.  For  each  fc  =  0, 1,2, . . . , d,  the  fcth  (conic) 
intrinsic  volume  v^{C)  is  given  by 

Hfc(C)  :=  P{nc(g)  lies  in  the  relative  interior  of  a  fc-dimensional  face  of  C}. 

As  usual,  g  is  a  standard  normal  vector  in  IRd.  We  extend  this  definition  to  a  general  closed  convex  cone  in  ^ 
by  approximating  it  with  a  sequence  of  polyhedral  cones. 

For  polyhedral  cones,  Definition  5.1  gives  an  attractive  intuition.  We  can  decompose  the  ambient  space  into 
d  +  1  disjoint  regions,  where  the  kth  region  contains  the  points  whose  projection  onto  the  cone  lies  in  the 
relative  interior  of  a  fc-dimensional  face;  the  fcth  intrinsic  volume  reflects  the  proportion  of  the  ambient  space 
comprised  by  the  fcth  region.  We  see  immediately  that  the  sequence  of  intrinsic  volumes  forms  a  probability 
distribution  on  [0,  l,2,...,d}.  The  definition  also  delivers  insight  about  several  fundamental  examples. 

Example  5.2  (Linear  subspaces).  Let  Lj  c  IRrf  be  a  j-dimensional  subspace.  Then  Lj  is  a  polyhedral  cone  with 
precisely  one  face,  so  the  map  11/  projects  every  point  onto  this  /-dimensional  face.  Thus, 
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Example  5.3  (The  nonnegative  orthant).  The  nonnegative  orthant  IR+  is  a  polyhedral  cone,  and  Table  4.1  lists 
its  statistical  dimension  as  <5(D$+)  =  \d.  The  projection  IIRd  (g)  lies  in  the  relative  interior  of  a  /c-dimensional 
face  of  the  orthant  if  and  only  if  exactly  k  coordinates  of  g  are  positive.  Each  coordinate  of  g  is  positive  with 
probability  one-half  and  negative  with  probability  one-half,  and  the  coordinates  are  independent.  Therefore, 
the  intrinsic  volumes  of  the  orthant  are  given  by 


[)  =  2 


-d 


d 

\kJ 


for  k  =  0,l,2,...,d. 


In  other  words,  the  intrinsic  volumes  coincide  with  the  probability  density  of  a  BinomialM,  random 
variable.  From  this  representation,  we  learn  that  the  largest  intrinsic  volume  of  the  orthant  occurs  at  the 
index  k  -  [\d\,  and  the  large  intrinsic  volumes  concentrate  sharply  around  this  index.  We  will  prove  that  this 
type  of  behavior  is  generic! 


For  a  nonpolyhedral  cone  C  e  rt?d,  the  projection  formula  in  Definition  5.1  breaks  down,  and  we  can  no 
longer  interpret  the  fcth  intrinsic  volume  in  terms  of  the  fc-dimensional  faces  of  C.  Furthermore,  it  takes  some 
work  to  give  sense  to  the  phrase  “approximation  by  polyhedral  cones.”  We  refer  to  the  book  [SW08,  Sec.  6.5] 
or  the  thesis  [Amell]  for  these  important  details. 

In  spite  of  these  caveats,  the  intrinsic  volumes  of  an  arbitrary  closed  convex  cone  still  form  a  probability 
distribution.  The  next  result  quantifies  this  statement  and  several  other  basic  relationships. 


Fact  5.4  (Properties  of  intrinsic  volumes).  Let  Ce^bea  closed  convex  cone.  The  intrinsic  volumes  of  the  cone 
obey  the  following  laws. 

( 1 )  Distribution.  The  intrinsic  volumes  describe  a  probability  distribution  on  {0, 1, ... ,  d}: 

d 

E  iZfc(C)  =  1  and  vk{C)  >  0  for  k-  0,1,2,..., d.  (5.1) 

k=0 

(2)  Polarity.  The  intrinsic  volumes  reverse  under  polarity: 

vk[C)  =  vd_k{C°)  for  k-0,l,2,...,d.  (5.2) 

(3)  Gauss-Bonnet  formula.  When  C  is  not  a  subspace, 

E  MC)=  E  ”kiC)  =  \.  (5.3) 

k= 0  A:=l  ^ 

k  even  k  odd 

(4)  Direct  products.  For  each  closed  convex  cone  K  e 

vk{CxK)=  E  Vi{C)-Vj{K)  for  k- 0, 1,2,. ..,d+  d! .  (5.4) 

i+j=k 

The  facts  (5.1),  (5.2),  and  (5.3)  are  drawn  from  [SW08,  Sec.  6.5].  The  product  rule  (5.4)  appears  in  [Amell, 
Prop.  4.4.13]. 

Partial  sums  of  intrinsic  volumes  play  a  central  role  in  the  kinematic  theory  of  convex  cones,  described 
below.  With  an  eye  toward  these  developments,  we  make  the  following  definition. 

Definition  5.5  (Tail  functionals).  Let  Ce^  be  a  closed  convex  cone.  For  each  k  =  0, \,2,...,d,  the  kth  tail 
functional  is  given  by 

d 

tt(Q:=  nfc(C)  +  iWC)  +  ---=  E  ^(C).  (5.5) 

i=k 

The  A;th  half-tail  functional  is  defined  as 

d 

MC):=nfc(C)  +  WC)  +  ---=  E  v}{Ci.  (5.6) 

j=k 

j  -  k  even 

We  require  an  interlacing  inequality  for  tail  functionals. 
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Proposition  5.6  (Interlacing).  For  each  closed  convex  cone  Ce%  that  is  not  a  linear  subspace, 

2hk{C)  >  ffc(C)  >  2/ifc+i(C)  for  each  k-  0,1,2,... ,d-  1. 

We  establish  Proposition  5.6  in  Appendix  D. 

5.2.  The  formulas  of  conic  geometry.  We  continue  with  a  selection  of  more  sophisticated  results  from  conic 
integral  geometry.  These  formulas  provide  detailed  answers,  expressed  in  terms  of  conic  intrinsic  volumes,  to 
the  geometric  questions  posed  at  the  beginning  of  Section  5.  For  this  discussion,  we  introduce  a  family  of 
geometric  functions  that  also  plays  a  key  role  in  our  analysis. 

Definition  5.7  (Tropic  functions).  Let  Lk  be  a  Avdimensional  subspace  of  Define 

/^(e):=P{||nL,(0)||2>e}  for  ££[0,1],  (5.7) 

where  0  is  uniformly  distributed  on  the  unit  sphere  in  Md. 

Basic  geometric  reasoning  reveals  that  Id[£)  is  the  proportion  of  points  on  the  sphere  SrJ  1  that  he  within  an 
angle  arccos(\/£)  of  the  subspace  Lk.  Our  terminology  derives  from  the  approximate  geographical  fact  that 
the  tropics  he  within  a  fixed  angle  (23°  26')  of  the  equator;  the  usual  term  regularized  incomplete  beta  function 
is  longer  and  less  evocative. 

The  core  fact  in  conic  integral  geometry  is  the  spherical  Steiner  formula  [Her43,  A1148,  San50],  which 
describes  the  fraction  of  points  on  the  sphere  that  he  at  most  a  fixed  angle  from  a  closed  convex  cone. 

Fact  5.8  (Spherical  Steiner  formula).  Let  C  e  ^  be  a  closed  convex  cone.  For  each  e  e  [0, 1], 

d 

P{||nc(0)l|2>£}=  £  vk(C)Idk{E).  (5.8) 

k=0 

The  spherical  Steiner  formula  often  serves  as  the  definition  of  conic  intrinsic  volumes;  it  can  also  be  derived 
from  the  definition  here.  For  a  modern  proof  of  Fact  5.8  in  the  spirit  of  this  work,  see  [SW08,  Thm  6.5.1]. 

The  second  essential  result  is  the  conic  kinematic  formula,  which  provides  an  exact  expression  for  the 
probability  that  a  randomly  oriented  cone  strikes  a  fixed  cone. 

Fact  5.9  (Conic  kinematic  formula).  Let  C,Kec£d  be  closed  convex  cones,  and  assume  that  C  is  not  a  subspace. 
Then 

P{C n  QK ±  {0}}  =  2hd+i{C  x  K).  (5.9) 

For  a  linear  subspace  Ld-m  c  with  dimension  d  -  m,  this  expression  reduces  to  the  Crofton  formula 

P{Cn  QLd.m  *  {0}}  =  2hm+i(C).  (5.10) 

See  [SW08,  p.  261]  for  a  proof  of  Fact  5.9.  In  Section  7,  we  use  deep  properties  of  the  intrinsic  volumes  to 
produce  an  approximate  version  of  the  conic  kinematic  formula,  which  ultimately  delivers  detailed  information 
about  phase  transitions. 

Remark  5.10  (Extended  kinematic  formula).  By  induction,  the  kinematic  formula  generalizes  to  a  family 
C, K± ,..., Kr  e  <Tc?^  of  closed  convex  cones  where  C  is  not  a  subspace: 

P{C n  Qi Kx  n  •  •  •  n  Qr Kr  +  {0}}  =  2  hrd+1  (C  x  Ki  x  •  ■  ■  x  Kr).  (5.11) 

Each  matrix  Q,  is  an  independent,  random  orthogonal  basis.  This  result  can  be  used  to  analyze  demixing 
problems  with  more  than  two  constituents. 

5.3.  Revisiting  the  statistical  dimension.  The  machinery  of  conic  integral  geometry  offers  several  new 
insights  about  the  statistical  dimension,  and  it  underscores  the  analogy  with  the  dimension  of  a  linear 
subspace.  The  next  proposition  demonstrates  that  the  statistical  dimension  of  a  cone  is  the  mean  of  the 
random  variable  on  {0, 1,2,...,  d}  whose  distribution  is  given  by  the  intrinsic  volumes  of  the  cone.  This  fact 
motivated  us  to  select  the  terminology  “statistical”  dimension. 

Proposition  5.11  (Statistical  dimension  as  mean  intrinsic  volume).  For  each  closed  convex  cone  C  e  f4)/, 

d 

8(C)  =  £  kvk{C). 
k=  l 


(5.12) 
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Proof.  The  spherical  formulation  (4.2)  of  statistical  dimension  shows  that 

5(C)  =  dE[||IIc(0)ll2]  -d  f1p{  ||nc(0) II2  >  4 d e. 

Jo 

We  have  used  integration  by  parts  to  express  the  expectation  as  an  integral  of  tail  probabilities.  The  Steiner 
formula  (5.8)  and  the  definition  (5.7)  of  the  tropic  function  allow  us  to  write  the  probability  as  a  sum: 

5(C)  =  d£  vk{C)  (1p{||ni,(0)||2>£}de=  £  nt(C)(dE[||nL,(0)||2]), 

fc= 1  Jo  k= 1 

where  Lk  is  an  arbitrary  fc-dimensional  subspace.  A  second  application  of  (4.2)  shows  that  the  parenthesis  is 
the  statistical  dimension  of  L k,  and  Proposition  4.1(4)  provides  that  5(6*)  =  k.  □ 

Proposition  5.11  has  a  significant  consequence.  Each  intrinsic  volume  v k  is  a  valuation  on  so  the 
statistical  dimension  is  also  a  valuation  on  Equivalently,  (i)  the  statistical  dimension  of  the  trivial  cone 
5  ({0})  =  0,  and  (ii)  if  C,  K  e  %j]  and  C  u  K  e  then  we  have  the  inclusion-exclusion  rule 

5 (C  u  K)  =  5 (C)  +  8{K)  -  5 (C  n  K) . 

This  property  is  analogous  with  the  inclusion-exclusion  law  for  the  dimension  of  a  subspace. 

The  long-standing  spherical  Hadwiger  conjecture  posits  that  each  continuous,  rotation-invariant  valuation  on 
c<od  can  be  written  as  a  linear  combination  of  the  conic  intrinsic  volumes.  If  this  conjecture  holds,  then  every 
such  valuation  is  determined  by  its  values  on  linear  subspaces.  Under  this  surmise,  we  obtain  a  fundamental 
characterization  of  the  statistical  dimension. 

Proposition  5.12  (Statistical  dimension  is  canonical).  If  the  spherical  Hadwiger  conjecture  holds,  then  the 
statistical  dimension  S  is  the  unique  continuous,  rotation-invariant  valuation  on  r-€d  that  satisfies  5(L)  -  dim(L) 
for  each  subspace  L  c  IRd. 

A  weaker  version  of  the  spherical  Hadwiger  conjecture  does  hold  for  geometric  parameters  known  as  curvature 
measures,  which  provides  a  rigorous  claim  that  the  statistical  dimension  is  canonical  under  an  additional 
technical  assumption  of  “localizability”;  see  [SW08,  p.  254  and  Thm.  6.5.4].  For  a  discussion  of  the  spherical 
Hadwiger  conjecture,  see  the  works  [McM93,  p.  976],  [KR97,  Sec.  11.5],  and  [SW08,  p.  263].  The  conjecture 
currently  stands  open  for  d  >  4. 

6.  Intrinsic  volumes  concentrate  near  the  statistical  dimension 

The  main  technical  result  in  this  paper  describes  a  deep  new  property  of  conic  intrinsic  volumes.  The 
intrinsic  volumes  of  a  cone  concentrate  near  the  statistical  dimension  of  the  cone  on  a  scale  determined  by 
the  statistical  dimension. 

Theorem  6.1  (Concentration  of  intrinsic  volumes).  Let  C  be  a  closed  convex  cone.  Define  the  transition  width 

w(C)  :=  v/0(C)  a<5(C°), 

and  introduce  the  function 


pcU)  :=  4 exp 

r  -d-2/8  )  ,  , 

2,r.  ,  for  A>0. 

(6.1) 

\ 

i.  (C)  +  A  / 

k-  <  5(C)  -  A  +  1 

=>  ffc_(C)  >  l-pc(A); 

(6.2) 

k+  >  6  (C)  +  A 

=>  tk+(C)<  pel  A). 

(6.3) 

The  tail  functional  is  defined  in  (5.5).  The  operator  a  returns  the  minimum  of  two  numbers. 

In  other  words,  the  sequence  Hfc(C) :  k  =  0,1,2,..., d]  of  tail  functionals  drops  from  one  to  zero  near  the 
statistical  dimension  5(C),  and  the  transition  occurs  over  a  range  of  0(&i(C))  indices.  Owing  to  the  fact  (5.1) 
that  the  intrinsic  volumes  form  a  probability  distribution,  we  must  conclude  that  the  intrinsic  volumes  nUC) 
are  all  negligible  in  size,  except  for  those  whose  index  k  is  close  to  the  statistical  dimension  5(C).  We  learn 
that  the  intrinsic  volumes  of  a  convex  cone  C  with  statistical  dimension  5(C)  are  qualitatively  similar  to  the 
intrinsic  volumes  of  a  subspace  with  dimension  [5(C)]. 
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Theorem  6.1  contains  additional  information  about  the  rate  at  which  the  tail  functionals  of  a  cone  transit 
from  one  to  zero.  To  extract  this  information,  it  helps  to  note  the  weaker  inequality 


PcW  £ 


f4e-A2/(16w2(C))) 

jle-A/ie 


0<  A<cj2(C) 
A  >  co2(C). 


(6.4) 


We  see  that  (6.2)  and  (6.3)  are  vacuous  until  A  ~  4  co(C).  As  A  increases,  the  function  pc(A)  decays  like  the  tail 
of  a  Gaussian  random  variable  with  standard  deviation  <  2v/2w(C).  When  A  reaches  co2(C),  the  decay  slows 
to  match  the  tail  of  an  exponential  random  variable  with  mean  <  16.  In  particular,  the  behavior  of  the  tail 
functionals  depends  on  the  intrinsic  properties  of  the  cone,  rather  than  the  ambient  dimension. 

In  Section  6.1,  we  outline  some  connections  between  Theorem  6.1  and  classical  inequalities  from  Euclidean 
integral  geometry.  In  Section  6.2,  we  argue  that  the  result  is  nearly  optimal.  Afterward,  in  Section  6.3  and  6.4, 
we  summarize  the  intuition  behind  the  proof  of  Theorem  6.1,  and  we  follow  up  with  the  technical  details. 
Later,  Sections  7-9  highlight  applications  in  conic  geometry,  optimization  theory,  and  signal  processing. 


6.1.  Parallels  with  Euclidean  integral  geometry.  Motivated  by  analogies  between  conic  and  Euclidean 
integral  geometry,  Schneider  &  Weil  [SW08,  p.  263]  ask  what  relationships  hold  among  the  conic  intrinsic 
volumes  uk  of  a  convex  cone.  To  the  best  of  our  knowledge,  the  literature  contains  only  two  nontrivial 
results  [GHS02]:  Among  all  cones  in  7?,/  with  a  fixed  value  of  v,i,  a  circular  cone  minimizes  v^-i  and 
maximizes  vq .  This  fact  is  a  consequence  of  spherical  isoperimetry. 

Our  work  provides  a  rich  family  of  inequalities  relating  conic  intrinsic  volumes  and  tail  functionals.  For  any 
convex  cone  C,  Theorem  6. 1  implies  that 


vk(C)  <  sk(C)  :=  exp 


-(fc-g(C))2  1 

<w2(C)  +  [fc-  b(C)| ) 


for  each  k-  0,1,2 


This  bound  also  relies  on  the  trivial  inequalities  vk(C)  <  4(C)  and  vk(C)  <1-  4+1  (C).  Observe  that  the 
sequence  {4(C)  :k-0,l,2,...,d}  of  upper  bounds  is  strictly  log-concave: 

s|(C)  >  sfc_i(C)-sfc+i(C)  for  fc=  l,2,3,...,d- 1. 

This  point  follows  from  the  fact  that  u  >— •  u2 1  (a>2(C)  +  |w|)  is  strictly  convex  for  every  nontrivial  convex  cone 
C.  (Indeed,  the  Gauss-Bonnet  formula  (5.3)  implies  that  \  <  6(C)  <d-\,  which  ensures  that  o>2(C)  >  |.)  We 
see  that  the  sequence  of  conic  intrinsic  volumes  is  dominated  by  a  log-concave  sequence;  cf.  the  conjecture 
in  [Amell,  Conj.  4.4.16]. 

The  Euclidean  intrinsic  volumes  of  a  convex  body  also  form  a  log-concave  sequence,  which  is  a  corollary  of 
the  Alexandrov-Fenchel  inequalities.  Log-concavity  of  the  Euclidean  intrinsic  volumes  has  many  fundamental 
consequences,  including  the  usual  isoperimetric  inequality  for  convex  bodies,  the  Brunn-Minkowski  inequality, 
and  the  Urysohn  inequality  [Sch93,  Chap.  6].  Theorem  6.1  delivers  a  system  of  inequalities  for  conic  intrinsic 
volumes  that  parallel  these  deep  classical  results. 


6.2.  Optimality  of  Theorem  6.1.  By  considering  circular  cones,  we  discover  that  Theorem  6.1  is  nearly 
optimal.  Let  us  proceed  with  a  heuristic  discussion  that  conveys  the  key  ideas.  Consider  the  cone  C  =  Circd(a). 
For  simplicity,  we  assume  that  d  =  2(n+Y)  for  a  large  integer  n,  and  we  abbreviate  q  -  sin2(a).  Proposition  4.3 
shows  that  the  statistical  dimension  6(C)  ~  dsin2(a)  ~  2 nq  and  8(C°)  ~  2n(l-  q).  According  to  [Amell, 
Ex.  4.4.8],  the  odd  intrinsic  volumes  of  C  satisfy 


v2k+l(C)  - 


\k) 


qka-q)n~k 


for  k  =  0,l,2,...,n. 


In  other  words,  2  v2k+\ (C)  =  P{X  =  k}  where  X  ~  Binomial(«,  q).  This  observation  invites  us  to  study  these 
cones  using  probabilistic  methods. 

First,  we  approximate  the  width  of  the  region  over  which  the  tail  functionals  of  C  change  from  one  to  zero. 
The  interlacing  result,  Proposition  5.6,  indicates  that 


n 


t2k(C)  ~  2  h2k+1  (C)  =  2  £  1221+1  (C)  =  P{X  >  k}. 
i-k 
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Under  appropriate  assumptions  on  q,  k,  and  n,  the  Central  Limit  Theorem  implies 


P{X  >  A;}  =  [ 


X-nq 


k-nq 


yjnq(\-q)  sjnq(l-q) 


Z> 


k-nq 
^nq(l-q) 


where  Z  is  a  standard  normal  variable.  For  the  index  2k  -  2 n q  +  A  ~  8(C)  +  A,  we  see  that 


t2k(C)  ~  Pi  Z> 


2\/ nq(l  -  q) 


■  exp 


-A2/ 4 
2nq{\  -  q) 


■  exp 


-A2/4 


for  A  »  0. 


The  second  approximation  follows  from  the  normal  tail  estimate  IP{Z  >  z}  ~  e 
2k-2nq-X^8(C)-\,  we  have 

I  -A2/4  ,  r  „ 

t2k(C)~  1-exp - - -  forA»0. 

V  (d  -  q) 5(C))  A(^d(C0)) 


((1  -  q)8(C))  A  (q8(C°)) , 

.  Similarly,  for  the  index 


In  other  words,  the  width  of  the  transition  region  for  the  tail  functionals  of  the  circular  cone  C  really  does 
have  order  co(C).  Furthermore,  considering  the  case  where  q  is  close  to  zero,  we  discover  that  the  constant  in 
the  exponent  of  (6.1)  lies  within  a  factor  two  of  the  best  possible. 

To  argue  that  pc  cannot  have  subgaussian  decay  when  A  is  large  relative  to  the  statistical  dimension  <5(C), 
we  consider  a  Poisson  limit  of  the  binomial  variable.  Suppose  that  q  -  bln  for  a  constant  b,  so  the  statistical 
dimension  8(C)  ~  2b.  For  2k  =  2b(l  +  A)  =;  (1  +  A)d(C), 

t2k{C)  ~  P>{X  >  k}  =  P{X  >  b(  1  +  A)}  «  eA-<1+A>log'1+A\ 


The  tail  estimate  follows,  for  example,  by  applying  Cramer’s  Theorem  [DZ10,  Thm.  2.2.3]  to  the  binomial 
random  variable  X.  In  other  words,  for  a  very  small  circular  cone,  the  tail  functionals  decay  only  a  little  faster 
than  subexponential  when  the  tail  index  k  is  a  multiple  of  the  statistical  dimension. 


6.3.  Heuristic  proof  of  Theorem  6.1.  The  basic  ideas  behind  the  argument  are  easy  to  summarize,  but  the 
details  demand  some  effort.  Let  Ce^  be  a  closed  convex  cone.  Recall  the  Steiner  formula  (5.8): 

d 

p{linc(0)ii2>4  =  £  vk(C)idk(e),  (6.5) 

k=0 

where  6  is  uniformly  distributed  on  the  sphere  Srf_1  and  the  tropic  function  Id  is  defined  in  (5.7). 

Concentration  of  measure  on  the  sphere  implies  that  the  random  variable  ||IIc(0)l|2  is  typically  very  close 
to  its  expected  value  8(C)ld,  determined  by  (4.2).  Thus,  the  left-hand  side  of  (6.5)  is  very  close  to  one  when 
ed  <  8(C)  and  very  close  to  zero  when  Ed  >  8(C). 

As  for  the  right-hand  side  of  (6.5),  recall  that  the  tropic  function  Id(£)  is  the  proportion  of  points  on  the 
sphere  within  a  distance  of  \/l -e  from  a  fixed  fc-dimensional  subspace.  Once  again,  concentration  of  measure 
ensures  that  Id(E)  is  close  to  zero  when  k<  Ed  and  close  to  one  when  k  >  Ed.  Therefore,  the  sum  on  the 
right-hand  side  of  (6.5)  is  approximately  equal  to  the  tail  functional  f£d(C). 

Combining  these  two  observations,  we  conclude  that  the  sequence  { tk  (C) :  k  =  0, 1, 2, . . . ,  d]  of  tail  functionals 
makes  a  sharp  transition  from  one  to  zero  when  8(C).  It  remains  to  make  this  reasoning  rigorous  and  to 
determine  the  range  of  k  over  which  the  transition  takes  place. 

6.4.  Proof  of  Theorem  6.1.  Let  C  e  ^  be  a  closed  convex  cone,  and  define  e  :=  k+ld.  The  first  part  of  the 
argument  requires  a  technical  lemma  that  we  prove  in  Appendix  D.  This  result  quantifies  how  much  of  the 
sphere  in  IRd  lies  within  an  angle  arccos(fc/d)  of  a  /c-dimensional  subspace. 

Lemma  6.2  (The  tropics).  For  all  integers  0  <  k<  d,  the  tropic  function  Id(kld)  >  0.3. 

We  begin  by  expressing  the  tail  functional  tk+  (C)  in  terms  of  the  probability  that  a  spherical  variable  lies  near 
the  cone  C. 

d 

tk+  (C)  =  £  vk(C)[ld (£)  +  (l  —  fjt  (£)) ] 

k-k+ 

<  £  Vk(C)Idk(E)  +  (\-Idkj£))  ^  Vk(C) 

k= 0  k=k+ 

<  p>{  linc(0)  II2  >  4  +  °-7  tk+  (C). 


(6.6) 
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The  first  identity  is  the  definition  (5.5)  of  the  tail  function.  To  reach  the  second  inequality,  we  inspect  the 
definition  (5.7)  to  see  that  /^(e)  is  a  decreasing  function  of  k  when  the  other  parameters  are  fixed.  In  the 
third  line,  we  invoke  the  Steiner  formula  (5.8)  to  rewrite  the  first  sum.  The  bound  for  the  second  sum  follows 
and  the  definition  (5.5)  of  the  tail  functional  and  from  Lemma  6.2  seeing  that  e  =  k+/d. 

Rearranging  (6.6),  we  obtain  the  bound 

tk+{C)  <  4P {d  ||nc(0)||2  >  de}  <  4P {d  ||IIc(0)||2  >  5(C)  +  A}.  (6.7) 

The  last  inequality  depends  on  the  fact  that  e-k+ld  and  the  definition  (6.3)  of  k+.  In  other  words,  the  tail 
functional  is  dominated  by  the  probability  that  a  random  point  on  the  sphere  is  close  to  the  cone. 

To  estimate  the  probability  in  (6.7),  we  need  a  tail  bound  for  the  squared  norm  of  the  projection  of  a 
spherical  variable  onto  a  cone.  This  result  is  encapsulated  in  the  following  lemma.  The  approach  is  more  or 
less  standard,  so  we  defer  the  details  to  Appendix  D. 

Lemma  6.3  (Tail  bound  for  conic  projections).  For  each  closed  convex  cone  C  e  %'d, 

P {d  ||IIc(fl) II2  >  5(C)  +  A}  <  exp  f  /8  - )  for  A  >  0.  (6.8) 

(wz(C)  +  A; 

Introducing  (6.8)  into  (6.7),  we  reach  the  upper  bound  (6.3)  on  the  tail  functional. 

To  develop  the  lower  bound  (6.2)  on  the  tail  functional  tk_{C),  we  use  a  polarity  argument.  Note  that 

d  d-k- 

tkAC)=  £  vk(P)=  £  vk{C°)  =  1-  td-k  (6.9) 

k-k-  k-0 

The  first  identity  is  the  definition  (5.5)  of  the  tail  functional  tk_(C).  The  second  relation  holds  because  of  the 
fact  (5.2)  that  polarity  reverses  intrinsic  volumes,  and  the  last  part  relies  on  (5.5)  and  the  property  (5.1)  that 
the  intrinsic  volumes  sum  to  one.  Owing  to  the  totality  law  (4.5)  and  the  definition  (6.2)  of  P_, 

d  -  fc_  +  1  =  5(C°)  +  5(C)  -  fc_  +  1  >  5(C°)  +  A. 

Therefore,  we  may  apply  (6.3)  to  obtain  an  upper  bound  on  the  tail  functional  td_k _+i(C°).  Substitute  this 
bound  into  (6.9)  to  establish  the  lower  bound  on  the  tail  functional  fj,-_(C)  stated  in  (6.2). 

7.  Approximate  kinematic  bounds 

We  are  now  prepared  to  establish  an  approximate  version  of  the  conic  kinematic  formula,  expressed  in 
terms  of  the  statistical  dimension.  Most  of  the  applied  theorems  in  this  paper  ultimately  depend  on  this  result. 
The  proof  combines  the  exact  kinematic  formula  (5.9)  with  the  concentration  of  intrinsic  volumes,  guaranteed 
by  Theorem  6.1. 

Theorem  7.1  (Approximate  kinematics).  Assume  that  A  >  0.  Let  C  c 
For  a  {d-  m)- dimensional  subspace  Ld_m,  it  holds  that 

m  >  6(C)  +  A  =>  P>{C  n  QLd_m  *  {0}}  < 

m  <  8(C)  -  A  =>  P{C  n  QLd_m  jt  {0}}  > 

For  an  arbitrary  convex  cone  K  c  IRd,  it  holds  that 

5(C)  +  5(k)<d-2A  =>  P{Cn  QkT /  {0}}  < 

8(C)  +  8{K)>d  +  2\  =>  P{Cn  QkT  /  {0}}  >  1  - 

The  functions  pc  and  pk  are  defined  by  the  expression  (6.1). 

Theorem  7.1  has  an  attractive  interpretation.  The  first  statement  (7.1)  shows  that  a  randomly  oriented 
subspace  with  codimension  m  is  unlikely  to  share  a  ray  with  a  fixed  cone  C,  provided  that  the  codimension 
m  is  larger  than  the  statistical  dimension  5(C)  of  the  cone.  When  the  codimension  m  is  smaller  than  the 
statistical  dimension  5(C),  the  subspace  and  the  cone  are  likely  to  share  a  ray. 

The  transition  in  behavior  expressed  in  (7.1)  takes  place  when  the  codimension  m  of  the  subspace  changes 
by  about  «(C)  =  \/5(C)  a  5(C°).  This  point  explains  why  the  empirical  success  curves  taper  in  the  corners  of  the 
graphs  in  Figure  2.2.  Indeed,  on  the  bottom-left  side  of  each  panel,  the  relevant  descent  cone  is  small;  on  the 

top-right  side  of  each  panel,  the  descent  cone  is  large,  so  its  polar  is  small.  In  these  regimes,  the  result  (7.1) 

shows  that  the  phase  transition  must  occur  over  a  narrow  range  of  codimensions. 


be  a  convex  cone  that  is  not  a  subspace. 


pc  (A); 
l-pc(A). 


(7.1) 


Pc  (A)  +  px(A); 
(pc(A)  +  px(A)). 


(7.2) 
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The  second  statement  (7.2)  provides  analogous  results  for  the  probability  that  a  randomly  oriented  cone 
shares  a  ray  with  a  fixed  cone.  This  event  is  unlikely  when  the  total  statistical  dimension  of  the  two  cones 
is  smaller  than  the  ambient  dimension;  it  is  likely  to  occur  when  the  total  statistical  dimension  exceeds  the 
ambient  dimension. 

For  the  case  of  two  cones,  it  is  harder  to  analyze  the  size  of  the  transition  region.  Since  the  probability 
bounds  in  (7.2)  are  controlled  by  the  sum  pcU)  +  PkW,  we  can  only  be  certain  that  the  probability  estimate 
depends  on  the  larger  of  the  two  quantities.  It  follows  that  the  width  of  the  transition  does  not  exceed  the 
larger  of  co(C)  =  V8(C)  a  <5(C°)  and  w ( K)  =  v8(K)  a  8(K°).  This  observation  is  sufficient  to  explain  why  the 
empirical  success  curves  taper  at  the  top-left  and  bottom-right  of  the  graphs  in  Figure  2.3.  Indeed,  these  are 
the  regions  where  one  of  the  descent  cones  is  small  and  the  polar  of  the  other  descent  cone  is  small. 

In  the  next  subsection,  we  proceed  with  the  proof  of  Theorem  7.1.  Afterward,  we  derive  Theorem  I  from  a 
similar,  but  slightly  easier  argument. 

7.1.  Proof  of  Theorem  7.1.  We  may  assume  that  C  and  K  are  both  closed  because  P{Cn  QK  ^  {0}}  = 
IP{Cn  QK  ^  {0}},  where  the  overline  denotes  closure.  This  is  a  subtle  point  that  follows  from  the  discussion  of 
touching  probabilities  located  in  [SW08,  pp.  258-259]. 

Let  us  begin  with  the  first  set  (7.1)  of  results,  concerning  the  probability  that  a  randomly  oriented 
subspace  strikes  a  fixed  cone.  Consider  the  first  implication,  which  operates  when  m  >  8(C)  +  A.  The  Crofton 
formula  (5.10)  shows  that 

P{CnQL/ {0}}  =  2hm+i(C)  <  tm(C), 

where  the  inequality  depends  on  the  interlacing  result,  Proposition  5.6.  The  concentration  of  intrinsic  volumes. 
Theorem  6.1,  demonstrates  that  the  tail  functional  satisfies  the  bound 

tm  (C)  <  pc  (A)  when  m  >  8(C)  +  A. 

This  completes  the  first  bound.  The  second  result,  which  holds  when  m  <  8(C)  -  A,  follows  from  a  parallel 
argument. 

The  conic  kinematic  formula  is  required  for  the  second  set  (7.2)  of  results,  which  concern  the  probability 
that  a  randomly  oriented  cone  strikes  a  fixed  cone.  Consider  the  situation  where  8(C)  +  8(K)  <  d  -  A.  The 
kinematic  formula  (5.9)  yields 

P{C  n  QK  ±  {0}}  =  2  hd+1  ( CxK)<  td  (C  x  IQ,  (7.3) 

where  the  inequality  follows  from  Proposition  5.6. 

We  rely  on  a  simple  lemma  to  bound  the  tail  functional  of  the  product  in  terms  of  the  individual  tail 
functionals.  The  proof  appears  in  Appendix  D. 

Lemma  7.2  (Tail  functionals  of  a  product).  Let  C  and  K  be  dosed  convex  cones.  Then 

t(S(.Cl+S{K)+2X\  (Cx  K)<  tf5(Q+Al  (C)  +  t[5(X)+Al  C*Q- 

Since  the  tail  functionals  are  weakly  decreasing,  our  assumption  that  8(C)  +  8(K)  <  d  -  A  implies  that 

td(C  X  K)  <  t[5(C)+5(X)+ 2A1  (C  X  K)<  t[5(C)+Al  (Q  +  bc5(X)+Al  C*0- 

Theorem  6.1  delivers  an  upper  bound  of  pc( A)  +  pk( A)  for  the  right-hand  side.  Introduce  these  bounds  into 
the  probability  inequality  (7.3)  to  complete  the  proof  of  the  first  statement  in  (7.2).  The  second  result  follows 
from  an  analogous  argument. 

7.2.  Proof  of  Theorem  I.  The  simplified  kinematic  bound  of  Theorem  I  involves  an  argument  similar  with 
the  proof  of  Theorem  7.1.  First,  assume  that  8(C)  +  8(K)  <  d-  A.  The  product  rule  (4.6)  for  cones  states  that 
8(C  x  K)  -  8(C)  +  8(K),  so  the  implication  (6.3)  in  Theorem  6.1  yields 

td(C  x  K)  <  PcxkW  < 4exp <  4e_'l2/(16d)  forO<A<d.  (7.4) 

\  d  +  A  ) 

The  second  relation  holds  because  the  totality  rule  (4.5)  ensures  that  8(C  x  K)  <  d  or  b((C  x  K)°)  <  d.  Substitute 
the  inequality  (7.4)  into  the  kinematic  bound  (7.3).  Then  make  the  change  of  variables  A>— •  a^xd,  where 
a ^  4v/log(4/?7),  to  obtain  the  estimate 

P{CnQK^{0]}<ri. 

This  establishes  the  first  part  of  Theorem  I.  The  argument  for  the  second  part  is  cut  from  the  same  pattern. 
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Figure  8.1:  Phase  transitions  in  random  cone  problems.  For  each  cone  C,  in  (8.2),  we  plot  the  empirical 
probability  that  the  cone  program  (8.1)  with  random  affine  constraints  is  feasible.  The  solid  gray  curve  traces 
the  logistic  fit  to  the  data,  and  the  finely  dashed  line  is  the  empirical  50%  success  threshold,  computed  from 
the  regression  model.  The  coarsely  dashed  line  marks  the  statistical  dimension  <5(Q),  which  is  the  theoretical 
estimate  for  the  location  of  the  phase  transition. 


8.  Application:  Cone  programs  with  random  constraints 

The  concentration  of  intrinsic  volumes  has  far-reaching  consequences  for  the  theory  of  optimization.  This 
section  describes  a  new  type  of  phase  transition  phenomenon  that  appears  in  a  cone  program  with  random 
affine  constraints.  We  begin  with  a  theoretical  result,  and  then  we  exhibit  some  numerical  examples  that 
confirm  the  analysis. 

8.1.  Cone  programs.  A  cone  program  is  a  convex  optimization  problem  with  the  following  structure: 

minimize  (it,  x)  subject  to  Ax-b  and  xeC,  (8.1) 

where  C  e  ^  is  a  closed  convex  cone.  The  decision  variable  x  e  [Rd,  and  the  problem  data  consists  of  a  vector 
u  e  a  matrix  A  e  Umxd,  and  another  vector  b  eR"'.  This  formalism  includes  several  fundamental  classes  of 
convex  programs: 

(1)  Linear  programs.  If  C  =  Kd,  then  (8.1)  reduces  to  a  linear  program  in  standard  form. 

(2)  Second-order  cone  programs.  If  C-  Lrf+1,  then  (8.1)  is  a  type  of  second-order  cone  program. 

(3)  Semidefinite  programs.  When  C-  S"xn,  we  recover  the  class  of  (real)  semidefinite  programs. 

In  addition  to  their  flexibility  and  modeling  power,  cone  programs  enjoy  effective  algorithms  and  a  crisp 
theory.  We  refer  to  [BTN01]  for  further  details. 

The  cone  program  (8.1)  can  exhibit  several  interesting  behaviors.  Let  us  remind  the  reader  of  the 
terminology.  A  point  x  that  satisfies  the  constraints  Ax-b  and  x  e  C  is  called  a  feasible  point,  and  the  cone 
program  is  infeasible  when  no  feasible  point  exists.  The  cone  program  is  unbounded  when  there  exists  a 
sequence  {x^\  of  feasible  points  with  the  property  ( u ,  xf)  —  -oo. 

Our  theory  allows  us  analyze  the  properties  of  a  random  cone  program.  It  turns  out  that  the  number  m  of 
affine  constraints  controls  whether  the  cone  program  is  infeasible  or  unbounded. 

Theorem  8.1  (Phase  transitions  in  cone  programming).  Let  Ce^^be  a  closed  convex  cone.  Consider  the  cone 
program  (8.1)  where  the  vector  b^O  is  fixed  while  the  vector  u  e  and  the  matrix  A  e  umxd  have  independent 
standard  normal  entries.  Then 

m<S{C)-X  =>  (8.1)  is  unbounded  with  probability  >1  -  pcW; 
m  >  5(C)  +  A  =>  (8.1)  is  infeasible  with  probability  >  1  -  pcW. 

The  function  pc  is  defined  by  the  expression  (6.1). 
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Table  8.1:  Empirical  versus  theoretical  phase  transitions.  For  each  cone  Q  listed  in  (8.2),  we  compare  the 
theoretical  location  of  the  phase  transition,  equal  to  the  statistical  dimension  d(C;),  with  the  empirical  location, 
computed  from  the  logistic  regression  model  in  Figure  8.1.  The  last  column  lists  the  errors,  relative  to  the 
dimension  d  =  396  of  the  problem. 


Cone 

5(C,):  Theoretical 

p,-:  Empirical 

|5(C,-) ~m\ Id:  Error 

Ci 

66.67 

66.88 

0.054% 

c2 

51.00 

51.70 

0.177% 

c3 

35.50 

36.06 

0.141% 

Proof.  Amelunxen  &  Burgisser  [AB12,  Thm.  1.3]  have  shown  that  the  intrinsic  volumes  of  the  cone  C  control 
the  properties  of  the  random  cone  program  (8.1): 

P{(8. 1)  is  infeasible}  =  1-  tm[C ); 

P{(8.1)  has  a  unique  minimizer}  =  vm{C ); 

P{(8.1)  is  unbounded}  =  tm+i(C). 

We  apply  Theorem  6.1  to  see  that  the  tail  functional  fm+i  (C)  is  extremely  close  to  one  when  the  number  m  of 
constraints  is  smaller  than  the  statistical  dimension  5(C).  Likewise,  tm{C)  is  extremely  close  to  zero  when  the 
number  m  of  constraints  is  larger  than  the  statistical  dimension.  We  omit  the  details,  which  are  analogous 
with  the  proof  of  Theorem  7.1.  □ 

8.2.  A  numerical  example.  We  have  conducted  a  computer  experiment  to  compare  the  predictions  of 
Theorem  8.1  with  the  empirical  behavior  of  a  generic  cone  program.  For  this  purpose,  we  study  some  random 
second-order  cone  programs.  In  each  case,  the  ambient  dimension  d  -  396,  and  we  consider  three  options  for 
the  cone  C  in  (8.1): 

Ci  :=  Circrf(ai);  (8.2a) 

C2  :=  Circrf/2(a2)  *  Circd/2(a:2);  (8.2b) 

C3  :=  Circrf/3(a3)  x  Circd/3(a:3)  x  Circrf/3(a3).  (8.2c) 

The  angles  satisfy  tan2(ai)  =  J  and  tan2(a2)  =  |  and  tan2(a3)  =  yj.  Using  the  product  rule  (4.6)  and  the 
integral  expression  (C.l)  for  the  statistical  dimension  of  a  circular  cone,  numerical  quadrature  yields 

d(Ci)~  66.67;  5(C2)  ~  51.00;  5(C3)  =  35.50. 

Theorem  8.1  indicates  that  a  cone  program  (8.1)  with  the  cone  C;  and  generic  constraints  is  likely  to  be 
feasible  when  the  number  m  of  affine  constraints  is  smaller  than  5(C;);  it  is  likely  to  be  infeasible  when  the 
number  m  of  affine  constraints  is  larger  than  5(Q). 

We  can  test  this  prediction  numerically.  For  each  i  =1,2,3  and  each  m  e  { 1,2,3,...,  [|d]},  we  perform  the 
following  steps  50  times: 

(1)  Independently  draw  a  standard  normal  matrix  A  e  umxd  and  standard  normal  ueUd  and  b  e  R"‘. 

(2)  Use  the  Matlab  package  CVX  to  solve  the  cone  program  (8.1)  with  C  =  Q. 

(3)  Report  failure  if  CVX  declares  the  cone  program  infeasible. 

For  each  i  =  1,2,3,  Figure  8.1  displays  the  empirical  success  probability,  along  with  a  logistic  fit  (Appendix  A.3). 
We  also  mark  the  theoretical  estimate  for  the  location  of  the  phase  transition,  which  is  equal  to  the  statistical 
dimension  5(6)).  Table  8.1  reports  the  discrepancy  between  the  theoretical  and  empirical  behaviors. 

9.  Application:  Vectors  from  lists? 

This  section  describes  a  situation  where  our  results  prove  that  a  particular  linear  inverse  problem  does  not 
provide  an  effective  way  to  recover  a  structured  vector.  Indeed,  a  significant  contribution  of  our  theory,  which 
has  no  parallel  in  the  current  literature,  is  that  we  can  obtain  negative  results  as  well  as  positive  results. 

In  [CRPW12,  Sec.  2.2],  Chandrasekaran  et  al.  propose  a  method  for  recovering  a  vector  from  an  unordered 
list  of  its  entries,  along  with  some  linear  measurements.  Here  is  one  way  to  frame  this  problem.  Suppose 
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that  jco  e  is  an  unknown  vector.  We  are  given  the  vector  yo  =  *o>  whose  entries  list  the  components  of  in 
weakly  decreasing  order.  We  also  collect  data  z0  =  Ax(]  where  A  is  an  in  x  d  matrix.  To  identify  x0,  we  must 
solve  a  structured  linear  inverse  problem. 

To  solve  this  problem,  Chandrasekaran  et  al.  propose  to  use  a  convex  regularizer  /  that  exploits  the 
information  in  the  vector  yo.  They  consider  the  Minkowski  gauge  of  the  permutahedron  generated  by  yo 

IWI^O'o)  :=  inf{r  >  0  :  x  e  t  ^(y0)}, 

and  they  frame  the  regularized  linear  inverse  problem 

minimize  ||jc||^(y0)  subject  to  zq  -  Ax.  (9.1) 

It  is  natural  to  ask  how  many  linear  samples  we  need  to  be  able  to  solve  this  inverse  problem  reliably.  Our 
theory  allows  us  to  answer  this  question  decisively  when  the  measurements  are  random. 

Proposition  9. 1  (Vectors  from  lists?) .  Let  x(l  e  R'1  be  a  fixed  vector  with  distinct  entries.  Suppose  we  are  given 
the  data  y0  =  x ^  and  z0  -  Ax0,  where  the  matrix  A  e  R>mxd  has  standard  normal  entries.  In  the  range  0  <  A  <  y/H^, 
it  holds  that 

m<  d-Hrf-  AyTlrf  ==>  (9 .1)  succeeds  with  probability  <  4e_,1'2/16; 

m>  d-Hd  +  Ay/Hrf  =>  (9.1)  succeeds  with  probability  >  1-  4e-/l"/16. 

The  dth  harmonic  number  satisfies  log  d  <  <  1  +  logd. 

Proposition  9.1  yields  the  depressing  assessment  that  we  need  a  near-complete  set  of  linear  measurements  to 
resolve  our  uncertainty  about  the  ordering  of  the  vector.  Nevertheless,  we  do  not  need  all  of  the  measurements. 
It  would  be  interesting  to  understand  how  much  the  situation  improves  for  vectors  with  many  duplicated 
entries. 

Proof.  This  result  follows  from  Fact  2.6  and  the  kinematic  bound  (7.1)  in  Theorem  7.1  as  soon  as  we  compute 
the  statistical  dimension  of  the  descent  cone  of  the  regularizer  W-W^\yo)  at  the  point  jco.  By  construction,  the 
unit  ball  of  ||  •||^(yo)  coincides  with  the  permutahedron  ^(y0),  which  equals  ^{x0)  by  permutation  invariance. 
Therefore, 

®  (II  •  II  ,  xb)  =  ®  (II  ■  II  $»(*,) ,  xq)  =  JaPixo),  *o)°- 

The  second  identity  follows  from  (3.1).  See  Figure  4.2  for  an  illustration  of  the  corresponding  facts  about 
signed  permutahedra.  To  compute  the  statistical  dimension,  we  apply  the  totality  law  (4.5)  to  see  that 

<5(@(IMI^(y0),A:o))  =  d-8[JY{@>{x 0),x0))  =  d- Hd, 

where  the  second  relation  follows  from  Proposition  4.9.  Apply  the  kinematic  result  (7.1)  for  subspaces,  and 
invoke  (6.4)  to  simplify  the  error  bound  pcW-  □ 

Remark  9.2  (Signed  vectors).  The  same  negative  results  hold  for  the  problem  of  reconstructing  a  general 
vector  from  an  unordered  list  of  the  magnitudes  of  its  entries,  along  with  some  linear  measurements.  In 
this  case,  the  appropriate  regularizer  is  the  Minkowski  gauge  of  the  signed  permutahedron.  We  can  use 
Proposition  4.9  to  compute  the  statistical  dimension  of  the  descent  cone.  For  a  d-dimensional  vector  with 
distinct  entries,  we  need  about  d-\ random  measurements  to  succeed  reliably. 

9.1.  A  numerical  example.  We  present  a  computer  experiment  that  confirms  our  pessimistic  analysis.  Fix 
the  ambient  dimension  d  =  100.  Set  jco  =  (1,2, ...,  100)  and  yo  =  x^.  For  each  m  -  85,86, ...,  100,  we  repeat  the 
following  procedure  50  times: 

(1)  Draw  a  matrix  A  e  U'nxd  with  independent  standard  normal  entries,  and  form  zq  =  Axt). 

(2)  Use  the  Matlab  package  CVX  to  solve  the  linear  inverse  problem  (9.1). 

(3)  Declare  success  if  the  solution  x  satisfies  ||x- jcoll  <  10-5. 

Figure  9.1  displays  the  outcome  of  this  experiment.  As  usual,  the  phase  transition  predicted  at  the  statistical 
dimension  d  -  Hc/  is  very  close  to  the  empirical  50%  mark,  which  we  obtain  by  performing  a  logistic  regression 
of  the  data  (see  Appendix  A.  3). 
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Figure  9.1:  Vectors  from  lists?  The  empirical  probability  that  the  convex  program  (9.1)  correctly  identifies  a 
vector  xo  in  K100  with  distinct  entries,  provided  an  unordered  list  yo  of  the  entries  of  jco  and  m  random  linear 
measurements  zq  =  Axq  .  The  solid  gray  curve  marks  the  logistic  fit  to  the  data.  The  midpoint  of  the  logistic  curve 
p  =  95.17  (finely  dashed  line),  while  the  theory  predicts  a  phase  transition  at  the  statistical  dimension  S  =  94.81 
(coarsely  dashed  line).  The  error  relative  to  the  dimension  \/i-5\/d  =  0.36%. 

10.  Related  work 

To  conclude  the  body  of  the  paper,  we  place  our  work  in  the  context  of  the  literature  on  geometric  analysis 
of  random  convex  optimization  problems.  We  trace  four  lines  of  thought  on  this  subject.  The  first  draws  from 
the  theory  of  polytope  angles;  the  second  involves  conic  integral  geometry;  the  third  is  based  on  comparison 
inequalities  for  Gaussian  processes;  and  the  last  makes  a  connection  with  statistical  decision  theory.  Our 
results  have  some  overlap  with  earlier  work,  but  our  discovery  that  the  sequence  of  conic  intrinsic  volumes 
concentrates  at  the  statistical  dimension  allows  us  to  resolve  several  subtle  but  important  questions  that  have 
remained  open  until  now. 

10.1.  Asymptotic  polytope-angle  computations  for  inverse  problems.  The  theory  of  polytope  angles 
dates  to  the  work  of  Schlafli  in  the  1850s  [Sch50b].  In  pioneering  research,  Vershik  &  Sporyshev  [VS86] 
applied  these  ideas  to  analyze  random  convex  optimization  problems.  They  were  able  to  estimate  the  average 
number  of  steps  that  the  simplex  algorithm  requires  to  solve  a  linear  program  with  random  constraints  as 
the  number  of  decision  variables  tends  to  infinity.  This  research  inspired  further  theoretical  work  on  the 
neighborliness  of  random  polytopes  [VS92,  AS92,  BH99] .  More  recently,  Donoho  [Don06b]  and  Donoho  & 
Tanner  [DT05,  DT09a,  DTlOa,  DTlOb]  have  used  similar  ideas  to  study  specific  regularized  linear  inverse 
problems  with  random  data.  The  papers  [XH11,  KXAH11]  contain  some  additional  work  in  this  direction.  Let 
us  offer  a  short,  qualitative  summary  of  this  research. 

Donoho  [Don06b]  analyzed  the  performance  of  the  convex  program  (1.1)  for  solving  the  compressed 
sensing  problem  described  in  Section  1.  In  the  asymptotic  regime  where  the  number  s  of  nonzeros  is 
proportional  to  the  ambient  dimension  d,  he  obtained  a  lower  bound  m  >  y/(s)  on  the  number  m  of  Gaussian 
measurements  required  for  the  optimization  to  succeed  (the  weak  transition).  Numerical  experiments  [DT09b] 
suggest  that  this  bound  is  sharp,  but  the  theoretical  analysis  in  [Don06b]  falls  short  of  establishing  that  a 
phase  transition  actually  exists  and  identifying  its  location  rigorously.  Finite-dimensional  results  with  a  similar 
flavor  appear  in  [DTlOb]. 

Donoho  [Don06b]  also  established  an  asymptotic  lower  bound  on  the  number  m  of  random  measurements 
required  to  recover  all  s-sparse  vectors  in  Mri  with  high  probability  (the  strong  threshold).  Using  different 
methods,  Stojnic  [Sto09]  has  improved  this  bound  for  some  values  of  the  sparsity  s.  These  bounds  are  not 
subject  to  numerical  interrogation,  so  we  do  not  have  reliable  evidence  about  what  actually  happens.  Indeed, 
it  remains  an  open  question  to  prove  that  a  strong  phase  transition  exists  and  to  identify  its  exact  location  in 
the  regime  where  the  sparsity  is  proportional  to  the  ambient  dimension. 

Donoho  &  Tanner  [DT09a]  have  also  made  a  careful  study  of  the  behavior  of  the  convex  program  (1.1) 
in  the  asymptotic  regime  where  the  sparsity  s  «  d.  In  this  case,  they  succeeded  in  proving  that  weak  and 
strong  thresholds  exist,  and  they  obtained  exact  formulas  for  the  thresholds.  More  precisely,  at  the  computed 
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thresholds,  they  show  that  the  probability  of  success  jumps  from  one  to  1  -  e,  where  e  is  positive.  Although 
these  results  do  not  ensure  that  certain  failure  awaits  on  the  other  side  of  the  threshold  curve,  they  do 
establish  that  the  behavior  changes. 

Donoho  &  Tanner  [DT05,  DT09a]  provide  similar  results  for  the  problem  of  recovering  a  sparse  nonnegative 
vector  by  solving  the  d\  minimization  problem  (1.1)  with  an  additional  nonnegativity  constraint.  Once  again, 
they  obtain  lower  bounds  on  the  number  of  Gaussian  measurements  required  for  weak  and  strong  success. 
These  bounds  are  sharp  in  the  regime  where  the  ultrasparse  regime  s  =  o(logd).  The  earlier  work  of  Vershik  & 
Sporyshev  [VS86]  contains  results  equivalent  with  the  weak  transition  estimate  from  [DT09a] . 

Other  authors  have  used  polytope  angle  calculations  to  develop  theory  for  related  £\  minimization  problems. 
For  example,  Khajehnejad  et  al.  [KXAH11]  provide  an  analysis  of  the  performance  of  weighted  (  \  regularizers. 
Xu  &  Hassibi  [XH11]  obtain  lower  bounds  for  the  number  of  measurements  required  for  stable  recovery  of  a 
sparse  vector  via  £\  minimization. 

Finally,  let  us  mention  that  Donoho  &  Tanner  [DTlOa]  have  obtained  a  more  complete  theory  about  the 
location  of  the  phase  transition  for  a  regularized  linear  inverse  problem  where  the  norm  is  used  as  a 
regularizer.  Their  results,  formulated  in  terms  of  projections  of  the  hypercube  and  orthant,  are  a  consequence 
of  a  geometric  theorem  that  goes  back  to  Schlafli  [Sch50a];  see  the  discussion  [SW08,  p.  299]. 

10.1.1.  Commentary.  The  analysis  of  structured  inverse  problems  by  means  of  polytope  angle  computations 
has  led  to  some  striking  conclusions,  but  this  approach  has  inherent  limitations.  First,  the  method  is  restricted 
to  polyhedral  cones,  which  means  that  it  is  silent  about  the  behavior  of  many  important  regularizers,  including 
the  Schatten  1-norm.  Second,  it  requires  detailed  bounds  on  all  angles  of  a  given  polytope  (equivalently,  all 
the  intrinsic  volumes  of  the  normal  cones  of  the  polytope),  which  means  that  it  is  difficult  to  extend  beyond  a 
few  highly  symmetric  examples.  For  this  reason,  most  of  the  existing  results  are  asymptotic  in  nature.  Third, 
because  of  the  intricacy  of  the  calculations,  this  research  has  produced  few  definitive  results  of  the  form  “the 
probability  of  success  jumps  from  one  to  zero  at  a  specified  location.” 

We  believe  that  our  analysis  supersedes  most  of  the  research  on  weak  phase  transitions  for  f:  \  regularized 
linear  inverse  problems  that  is  based  on  polytope  angles.  We  have  shown  for  the  first  time  that  there  is  a 
transition  from  absolute  success  to  absolute  failure,  and  we  have  characterized  the  location  of  the  threshold 
when  the  sparsity  .v  >  \fd  +  1.  On  the  other  hand,  we  have  not  verified  that  our  upper  bound  for  the  statistical 
dimension  of  the  descent  cone  of  the  £\  norm  at  a  sparse  vector  is  sharp  when  s  <  \fd.  Currently,  the 
paper  [DT09a]  contains  the  only  authoritative  results  in  the  ultrasparse  regime  s=  o(logd). 

It  is  not  hard  to  extend  our  analysis  to  the  other  settings  discussed  in  this  section.  Indeed,  we  can  easily 
study  regularized  inverse  problems  involving  weighted  d\  norms  and  (\  norms  with  nonnegativity  constraints. 
We  can  effortlessly  rederive  phase  transitions  for  regularized  problems.  Bounds  for  strong  transitions  are 
also  accessible  to  our  methods.  We  have  omitted  all  of  this  material  for  brevity. 

10.2.  Conic  intrinsic  volumes.  In  modern  geometry,  work  on  polvtope  angles  has  largely  been  supplanted  by 
research  on  spherical  and  conical  integral  geometry  [SW08].  Several  authors  have  independently  recognized 
the  power  of  this  approach  for  analyzing  random  instances  of  convex  optimization  problems. 

Amelunxen  [Amell]  and  Amelunxen  &  Biirgisser  [AB11,  AB12]  have  shown  that  conic  geometry  offers 
an  elegant  way  to  perform  average-case  and  smoothed  analysis  of  conic  optimization  problems.  Their  work 
requires  detailed  computations  of  conic  intrinsic  volumes,  which  can  make  it  challenging  to  apply  to  particular 
cases.  We  can  simplify  some  of  their  techniques  using  the  new  fact,  Theorem  6.1,  that  intrinsic  volumes 
concentrate  at  the  statistical  dimension.  Theorem  8.1  is  based  on  their  research. 

McCoy  &  Tropp  [MT12]  have  used  conic  geometry  to  study  the  behavior  of  regularized  linear  inverse 
problems  with  random  measurements  and  regularized  demixing  problems  under  a  random  model.  This 
approach  leads  to  both  upper  and  lower  bounds  for  weak  and  strong  phase  transitions  in  a  variety  of  problems. 
As  with  Amelunxen’s  work  [Amell],  this  research  depends  on  detailed  computations  of  conic  intrinsic 
volumes.  As  a  consequence,  it  was  not  possible  to  rigorously  locate  the  phase  transition,  nor  was  there  any 
general  theory  to  inform  us  that  phase  transitions  must  exist  in  general.  Combining  the  ideas  from  [MT12] 
with  Theorem  7.1,  we  are  able  to  reach  more  definitive  conclusions. 

10.3.  Gordon’s  comparison  and  Gaussian  widths.  The  work  we  have  discussed  so  far  depends  on  various 
flavors  of  integral  geometry.  There  is  a  completely  different  technique  for  analyzing  linear  inverse  problems 
with  random  data  that  depends  on  a  comparison  principle  for  Gaussian  processes,  due  to  Gordon  [Gor85]. 
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Gordon  [Gor88]  explains  how  to  use  this  comparison  to  find  accurate  bounds  on  the  probability  that  a 
randomly  oriented  subspace  strikes  a  subset  of  the  sphere.  In  particular,  his  ideas  can  be  used  to  bound  the 
probability  that  the  null  space  of  a  random  matrix  intersects  a  descent  cone,  which  results  in  a  version  of 
Theorem  II. 

Rudelson  &  Vershynin  [RV08]  were  the  first  authors  to  observe  that  Gordon’s  work  is  relevant  to  the 
analysis  of  the  compressed  sensing  problem.  Stojnic  [Sto09]  refined  the  method  enough  that  he  was  able  to 
establish  empirically  sharp  lower  bounds  for  the  number  m  of  measurements  required  for  the  compressed 
sensing  problem.  Oymak  &  Hassibi  [OHIO]  have  also  applied  these  techniques  to  develop  empirically  sharp 
lower  bounds  on  the  number  of  random  measurements  needed  to  identify  a  low-rank  matrix  using  the 
Schatten  1-norm.  Later,  Chandrasekaran  et  al.  [CRPW12]  showed  that  a  similar  approach  leads  to  empirically 
sharp  lower  bounds  on  the  number  of  random  measurements  required  to  solve  other  types  of  regularized 
inverse  problems. 

All  of  this  work  depends  on  a  summary  parameter  for  convex  cones  called  the  Gaussian  width.  For  a  closed 
convex  cone  C  e  r4'd,  the  width  is  defined  as 

w(C)  :=  E  supieCnSd-i  (g,  x) .  (10.1) 

The  Gaussian  width  has  a  tight  connection  with  the  statistical  dimension. 

Proposition  10.1  (Statistical  dimension  &  Gaussian  width).  Let  C  be  a  convex  cone.  Then 

wz{C)  <  6(C)  <  w2[C)  +  1.  (10.2) 

The  lower  bound  in  (10.2)  is  an  easy  consequence  of  duality,  and  the  upper  bound  depends  on  some 
concentration  arguments.  See  Appendix  E  for  a  short  proof. 

As  a  consequence  of  Proposition  10.1,  we  can  import  ideas  from  the  literature  on  Gaussian  widths  to  obtain 
accurate  computations  of  the  statistical  dimension.  Conversely,  our  calculations  of  the  statistical  dimension 
lead  to  accurate  bounds  for  Gaussian  widths.  In  particular,  Theorem  4.5  provides  the  first  proof  that  previous 
calculations  of  Gaussian  widths  are  essentially  sharp. 

We  have  a  strong  preference  for  the  statistical  dimension  over  the  Gaussian  width.  Indeed,  the  statistical 
dimension  canonically  extends  the  linear  dimension  to  the  class  of  convex  cones  (Proposition  5.12).  The 
statistical  dimension  also  summarizes  the  sequence  of  conic  intrinsic  volumes  (Proposition  5.11).  Since 
intrinsic  volumes  drive  the  kinematic  formula  (5.9)  and  its  generalization  (5.11),  this  connection  has  many 
consequences  for  conic  integral  geometry. 

10.4.  Minimax  Denoising.  Several  authors  [DMM09a,  DJM11,  DGM13]  have  remarked  on  the  power  of 
statistical  decision  theory  to  empirically  predict  the  location  of  the  phase  transition  in  a  regularized  linear 
inverse  problem  with  random  data.  For  the  compressed  sensing  problem,  two  recent  papers  [BM12,  BLM12] 
provide  a  rigorous  explanation  for  this  coincidence.  But  there  is  no  general  theory  that  illuminates  the 
connection  between  these  two  settings.  Our  work,  together  with  a  recent  paper  of  Oymak  &  Hassibi  [OH12], 
resolves  this  issue.  In  short,  Oymak  &  Hassibi  show  that  the  minimax  risk  for  denoising  is  essentially  the  same 
as  the  statistical  dimension,  while  our  research  proves  that  a  phase  transition  must  occur  at  the  statistical 
dimension.  Let  us  elaborate. 

A  classical  problem  in  statistics  is  to  estimate  a  target  vector  x0  given  an  observation  of  the  form  z0  =  x(]  +  erg 
where  g  is  a  standard  normal  vector  and  a  is  an  unknown  variance  parameter.  When  the  unknown  vector  x() 
has  specified  properties  (e.g.,  sparsity),  we  can  often  construct  a  convex  regularizer  /  that  promotes  this  type 
of  structure  [CRPW12].  A  natural  estimation  procedure  is  to  solve  the  convex  optimization  problem 

xr  argmin  yf(x)  +  |  \\z0-x\\2 .  (10.3) 

The  regularization  parameter  y  >  0  negotiates  a  tradeoff  between  the  structural  penalty  and  the  data  fidelity 
term.  One  way  to  assess  the  performance  of  the  estimator  (10.3)  is  the  minimax  MSE  risk,3  defined  as 

Rmm(-h))  :=  sup  inf  -^E[||xr-x0ll2]. 

(7>o  7> o  cr2 


o 

The  usual  definition  of  the  minimax  risk  involves  an  additional  supremum  over  a  class  of  distributions  on  the  target  xq.  In  many 
applications,  the  symmetries  in  the  regularizer  /  allow  a  straightforward  reduction  to  the  case  of  a  fixed  target  xq  . 
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In  other  words,  the  risk  identifies  the  relative  mean-square  error  for  the  best  choice  of  tuning  parameter  7  at 
the  worst  choice  of  the  noise  variance  a2. 

The  papers  [DMM09a,  DJM1 1,  DGM13]  examine  several  regularizers  /  where  the  minimax  risk  empirically 
predicts  the  performance  of  the  linear  inverse  problem  (2.5)  with  a  Gaussian  measurement  matrix  A.  The 
authors  of  this  research  propound  a  conjecture  that  may  be  expressed  as  follows. 

Conjecture  10.2  (Minimax  risk  predicts  phase  transitions).  Suppose  that  A  e  Rmxd  is  a  matrix  with  independent 
standard  normal  entries,  and  let  f :  — -  K  be  a  convex  function.  Then 

tn>  Rmm{xo)  + old)  =>  (2.5)  succeeds  with  probability  1-  o(l); 

m  <  -Rmmhro)  +  old)  ==>  (2.5)  succeeds  with  probability  o(l). 

The  order  notation  here  should  be  interpreted  heuristically. 

To  the  best  of  our  knowledge,  this  claim  has  been  established  rigorously  only  for  the  £\  norm  [BLM12,  Thm.  8] 
in  the  asymptotic  setting.  The  paper  [BLM12]  also  includes  analysis  for  a  wider  class  of  matrices. 

Together,  our  paper  and  the  recent  paper  [OH12]  settle  Conjecture  10.2  in  the  nonasymptotic  setting  for 
many  regularizers  of  interest.  Indeed,  Oymak  &  Hassibi  [OH  12]  prove  that 

|^mm  (xo)- <?(©(/>  *o))|  =  OlVd). 

Their  result  holds  under  mild  conditions  on  the  regularizer  /  that  suffice  to  address  most  of  the  phase 
transitions  conjectured  in  the  literature.  Our  result,  Theorem  II,  demonstrates  that  the  phase  transition  in  the 
linear  inverse  problem  (2.5)  with  a  standard  normal  matrix  Ae  Rmxd  occurs  when 

\m-S[S>lf,xo))\  =  0(\/d). 

Combining  these  two  results,  we  conclude  that,  in  some  generality,  the  minimax  risk  coincides  with  the 
location  of  the  phase  transition  in  a  regularized  linear  inverse  problem  with  random  measurements. 

Appendix  A.  Computer  experiments 

We  confirm  the  predictions  of  our  theoretical  analysis  by  performing  computer  experiments.  This  ap¬ 
pendix  contains  some  of  the  details  of  our  numerical  work.  All  experiments  were  performed  using  the  CVX 
package  [GB13]  for  Matlab  with  the  default  settings  in  place. 

A.l.  Linear  inverse  problems  with  random  measurements.  This  section  describes  the  two  experiments 
from  Section  2.3  that  illustrate  the  empirical  phase  transition  in  compressed  sensing  via  £\  minimization  and 
in  low- rank  matrix  recovery  via  Schatten  1-norm  minimization. 

In  the  compressed  sensing  example,  we  fix  the  ambient  dimension  d  =  100.  For  each  m=  1,2,3,..., d-  1  and 
each  s  =  1,2,3,..., d- 1,  we  repeat  the  following  procedure  50  times: 

(1)  Construct  a  vector  jco  e  with  s  nonzero  entries.  The  locations  of  the  nonzero  entries  are  selected  at 
random;  each  nonzero  equals  ±  1  with  equal  probability. 

(2)  Draw  a  standard  normal  matrix  A  e  IRmxrf,  and  form  z0  -  Ax0. 

(3)  Solve  (2.6)  to  obtain  an  optimal  point  x. 

(4)  Declare  success  if  ||  jc  —  jco  II  <  10-5. 

All  random  variables  are  drawn  independently  in  each  step  and  at  each  iteration.  Figures  1.1  and  2.2  [left] 
show  the  empirical  probability  of  success  for  this  procedure. 

We  take  a  similar  approach  in  the  low-rank  matrix  recovery  problem.  Fix  n  -  30,  and  consider  square  n  x  n 
matrices.  For  each  rank  r  =  1,2,...,  n  and  each  m  -  1,29,58,87, ...,n2,  we  repeat  the  following  procedure  50 
times: 

(1)  If  r  >  \s/m\  +  1,  declare  failure  because  the  number  of  degrees  of  freedom  in  an  nxn  rank-r  matrix 
exceeds  the  number  m  of  measurements. 

(2)  Draw  a  rank-r  matrix  Xo  -  QiQ[,  where  Qi  and  Q2  are  independent  n  x  r  matrices  with  orthonormal 

columns,  drawn  uniformly  from  an  appropriate  Stiefel  manifold  [Mez07]. 

2 

(3)  Draw  a  standard  normal  matrix  Ae  Umxn  ,  and  define  ,<// (X) A ■  vec(X),  where  the  vectorization 
operator  stacks  the  columns  of  a  matrix.  Form  the  vector  of  measurements  zq  =  srf (Xo). 

(4)  Solve  (2.7)  to  obtain  an  optimal  point  X. 

(5)  Declare  success  if  HX-XqIIf  £  10-5. 
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As  before,  all  random  variables  are  chosen  independently.  Readers  interested  in  reproducing  this  experiment 
should  be  aware  that  this  procedure  required  nearly  one  month  to  execute  on  a  desktop  workstation. 
Figure  2.2  [right]  displays  the  results  of  this  experiment. 

A.2.  Statistical  dimension  curves.  The  formulas  (4.11)  and  (4.14)  for  the  statistical  dimension  of  the 
descent  cones  of  the  C.\  norm  and  the  Schatten  1-norm  do  not  have  a  closed  form  representation.  Nevertheless, 
we  can  evaluate  these  expressions  using  simple  numerical  methods.  Indeed,  in  each  case,  we  solve  the 
stationary  equation  (4.13)  and  (4.16)  using  the  rootfinding  procedure  fzero,  which  works  well  because  the 
left-hand  side  of  each  equation  is  a  monotone  function  of  r.  To  evaluate  the  integral  in  (4.11),  we  use  the 
command  erfc.  To  evaluate  the  integral  in  (4.14),  we  use  the  quadrature  function  quadgk. 

We  have  encountered  some  numerical  stability  problems  evaluating  (4.11)  when  the  proportional  sparsity 
p  =  s/d  is  close  to  zero  or  one.  Similarly,  there  are  sometimes  difficulties  with  (4.14)  when  the  proportional 
rank  p-  rim  or  the  aspect  ratio  v  =  min  are  close  to  zero  or  one.  Nevertheless,  relatively  simple  code  based 
on  this  approach  is  usually  reliable. 

A.  3.  Logistic  regression.  Several  of  the  experiments  involve  fitting  the  logistic  function 

i  +  e-tfo+Pix) 

to  the  data,  where  j0o,/3i  e  R  are  parameters.  We  use  the  command  glmf  it  to  accomplish  this  task.  The  center 
p\-  -(}(■>/  P\  of  the  logistic  function  is  the  point  such  that  tip)  = 

Appendix  B.  Theoretical  results  on  descent  cones 

This  appendix  contains  the  theoretical  analysis  that  permits  us  to  calculate  the  statistical  dimension  of 
descent  cones.  In  particular,  we  complete  the  proof  of  Proposition  4.4,  and  we  establish  Theorem  4.5.  The 
material  in  this  appendix  has  some  overlap  with  independent  work  due  to  Oymak  &  Hassibi  [OH  12]. 

B. l.  The  expected  distance  to  the  subdifferential.  To  complete  the  proof  of  Proposition  4.4,  we  must  show 
that  the  function  F :  t  >— ■  E  [  dist2  (g,  r  •  df(x))  ]  exhibits  a  number  of  analytic  and  geometric  properties.  The 
hypotheses  of  the  proposition  ensure  that  df{x )  is  a  nonempty,  compact,  convex  set  that  does  not  contain  the 
origin.  For  clarity,  we  establish  an  abstract  result  that  only  depends  on  the  distinguished  properties  of  the 
subdifferential.  Let  us  begin  with  a  lemma  about  a  related,  but  simpler,  function. 

Lemma  B.l  (Distance  to  a  dilated  set).  Let  She  a  nonempty,  compact,  convex  subset  ofUd  that  does  not  contain 
the  origin.  In  particular,  there  are  numbers  that  satisfy  b  <  ||s||  <  B  for  all  seS.  Fix  a  point  u  e  and  define  the 
function 

Fu  :  r  dist2(«, tS)  for  t  >  0.  (B.l) 

The  following  properties  hold. 

( 1 )  The  function  Fu  is  convex. 

(2)  The  function  satisfies  the  lower  bound 

Fu{r)  >  (rft-  II nil)2  for  all  r  >  Hull  lb.  (B.2) 

In  particular,  Fu  attains  its  minimum  value  in  the  interval  [0,2 rib}. 

(3)  The  function  Fu  is  continuously  differentiable,  and  the  derivative  takes  the  form 

2 

Fu(t)  =  —  (u-7itS[u),  7ItS{u))  for  r  >  0.  (B.3) 

The  right  derivative  F'u{0)  exists,  and  F'u[ 0)  =  limTj0-F^(T). 

(4)  The  derivative  admits  the  bound 

|F^(t)|  <  2B(||n[|  +  tB)  for  all  t>  0.  (B.4) 

(5)  Furthermore,  the  map  u  F'u{t)  is  Lipschitzfor  each  t  >  0: 

|.F^(t)  -  Fy{ t) |  <2B\\u-  y\\  for  all  u,yeUd. 


(B.5) 
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Proof.  These  claims  all  take  some  work.  Along  the  way,  we  also  need  to  establish  some  auxiliary  results  to 
justify  the  main  points. 

Convexity.  For  r  >  0,  convexity  follows  from  the  representation 

2  2 

Fu{t)  =  f inf  || m  —  ts||  1  =  fr-inf  ||h/t- s|]  1  =  [T-dist(M/T,S)l2.  (B.6) 

1  seS  J  1  seS  1 

Byway  of  justification,  the  distance  to  a  closed  convex  set  is  a  convex  function  [Roc70,  p.  34],  the  perspective 
transformation  [HUL93a,  Sec.  IV.2.2]  of  a  convex  function  is  convex,  and  the  square  of  a  nonnegative  convex 
function  is  convex  by  a  direct  calculation. 

Continuity.  The  representation  (B.6)  shows  that  the  function  F„  is  continuous  for  r  >  0  because  the  distance 
to  a  convex  set  is  a  Lipschitz  function,  as  stated  in  (3.5).  To  obtain  continuity  at  r  =  0,  simply  note  that 

|T„(£)-F„(0)|  =  | II  m  —  tt£s(h) ||2  —  ||  m || 2 [  <2  Km  7t£S(w)>|  +  ||7r£S(w)||2  <  2  ||u||  [eB)  +  {eB)2  ^  0  as  e^O. 

Indeed,  each  point  in  eS  is  bounded  in  norm  by  eB,  so  the  projection  zr£s(u)  admits  the  same  bound.  Continuity 
implies  that  Fu  is  convex  on  the  entire  range  r  >  0. 

Attainment  of  minimum.  Assume  that  t>\\u\\  lb.  Then 

dist(ir,  tS)  =  inf  ||rs-  w||  >  inf  \t  ||s||  -  ||//||  1  >  ib-  ||  u\\  >0. 

seS  seS 

Square  this  relation  to  reach  (B.2).  It  follows  that  Fu{ r)  >  Fu{ 0)  =  ||m||2  for  all  r  >  2||m||  lb.  Therefore,  any 
minimizer  of  Fu  must  occur  in  the  compact  interval  [0,2||m||  lb].  Since  Fu  is  continuous,  it  attains  its  minimal 
value  in  this  range. 

Differentiability.  We  obtain  the  derivative  from  a  direct  calculation: 

F'u( t)  =  —  [t2  dist2(w/T,  S)]  =  2t  •  dist2(w/T,S)  +  r2  (2  ((m/t)  -  tts(m/t)),  -m/t2) 

2  2 

=  -[dist2(M,rS)  -  (u-jixS{u),  u)  ]  - -- {u- 71tS{u)}  tits(u)) 

T  T 

The  first  relation  follows  from  (B.6).  The  second  relies  on  the  formula  (3.6)  for  the  derivative  of  the  squared 
distance.  To  obtain  the  fourth  relation,  we  express  the  squared  distance  as  \\u- 7iTs(u)\\2. 

Right  derivative  at  zero.  The  right  derivative  F'u{ 0)  exists,  and  the  limit  formula  holds  because  Fu  is  a  proper 
convex  function  that  is  continuous  on  [0,oo]  and  differentiable  on  (0,oo);  see  [Roc70,  Thm.  24.1]. 

Continuity  of  the  derivative.  The  expression  (B.3)  already  implies  that  F'u  is  continuous  for  r  >  0  because  the 
projection  onto  a  convex  set  is  continuous  [RW98,  Thm.  2.26].  Continuity  of  the  derivative  at  zero  follows 
from  the  limit  formula  for  the  right  derivative  at  zero. 

Bound  for  the  derivative.  Given  the  formula  (B.3),  it  is  easy  to  control  the  derivative  when  r  >  0: 

\F'u(t)\ <  ^  \\u-7irS{u)\\  ||jttS(m)||  <  -  (I|m||  +  tB)[tB)  —  2B  (||m||  +  tB). 

We  obtain  the  estimate  for  t  =  0  by  taking  the  limit. 

Lipschitz  property.  We  obtain  the  Lipschitz  bound  (B.5)  from  (B.3)  after  some  effort.  Fix  r  >  0.  The 
optimality  condition  [HUL93a,  Thm.  III. 3. 1.1]  for  a  projection  onto  a  closed  convex  set  implies  that 

(y-^rs(y).  ^Ts(y))^<y-^rs(y),  nTsM)  for  all  u,yeUd. 

As  a  consequence, 

{u  —  7irs(u),  7irS(u))-(y-7iTS{y),  nrSiy))  =£  <(m-zttS(m))  -  (y- 7TrS(.y)),  zttS(m)) 

<  ||(I  -  7Trs)(u)  -  (I  -  71  tS)  (y)  ||  ||  71tS{u)  ||  <  ||  U  -  y  ||  •  (T  B) . 

The  last  relation  relies  on  the  fact  (3.4)  that  the  map  I-ttts  is  nonexpansive.  Reversing  the  roles  of  u  and  y 
in  the  last  calculation,  we  see  that 

|<h-;ttS(m),  7itS{u))  -  (y-7iTS{y),  ^rs(y)>|  ^  DB)  ■  \\u-  y|| . 

Combining  this  estimate  with  the  expression  (B.3)  for  the  derivative,  we  reach 

|^(T)-F;(T)|<2B-|l«-yl|. 

For  r  =  0,  the  result  follows  when  we  take  the  limit  as  r  |  0.  □ 
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With  this  result  at  hand,  we  are  prepared  to  prove  a  lemma  that  confirms  the  remaining  claims  from 
Proposition  4.4. 

Lemma  B.2  (Expected  distance  to  a  dilated  set).  Let  She  a  nonempty,  compact,  convex  subset  ofUd  that  does 
not  contain  the  origin.  Then  the  function 

F( t)  :=  E  [dist2(g,rS)]  =  E  [Fg(r)]  for  t  >  0 

is  strictly  convex,  continuous  at  t  -  0,  and  differentiable  for  r  >  0.  It  attains  its  minimum  at  a  unique  point. 
Furthermore, 

F'(t)  =  E[Fg(r)]  for  all  x>0.  (B.7) 

For  t  -0,  we  interpret  F'{t )  as  a  right  derivative. 

Proof.  These  properties  will  follow  from  Lemma  B.l,  and  we  continue  using  the  notation  from  this  result. 

Convexity.  The  function  F  is  convex  for  r  >  0  because  it  is  an  average  of  functions  of  the  form  Fg,  each  of 
which  is  convex. 

Strict  convexity.  We  argue  by  contradiction.  Since  F  is  convex,  if  F  were  not  strictly  convex,  its  graph  would 
contain  a  linear  segment.  More  precisely,  there  would  be  numbers  0  <  p  <  t  and  tj  e  (0, 1)  for  which 

E  [Fg{{pp  +  (1  -  i7)r)S)]  =  E  [17 ■  Fg(p)  +  (1  - 17)  •  Fg (r) ] .  (B.8) 

The  convexity  of  Fg  ensures  that,  for  each  g,  the  bracket  on  the  right-hand  side  is  no  smaller  than  the  bracket 
on  the  left-hand  side.  Therefore,  the  relation  (B.8)  holds  if  and  only  if  the  two  brackets  are  equal  almost 
surely  with  respect  to  the  Gaussian  measure.  But  note  that 


F0(tjp  +  (1  -  77) t)  =  dist2  (0,  ( Tjp  +  (1  -  77)t)S)  =  [rip  +  (1  -  ■  inf  ||  s\\2 

seS 

<  ( T)P 2  +  (1  -  77)T2)  •  inf  || s||2  =  T]  ■  dist2 (0,  pS)  +  (1  -  77)  ■  dist2 (0,  rS)  =  77  •  F0(p)  +  (1  -  77)  •  F0(r). 

seS 

The  strict  inequality  depends  on  the  strict  convexity  of  the  square,  together  with  the  fact  that  the  infimum  is 
strictly  positive.  On  account  of  (3.5),  the  squared  distance  to  a  convex  set  is  a  continuous  function,  so  there  is 
an  open  ball  around  the  origin  where  the  same  relation  holds.  That  is,  for  some  e  >  0, 

FuihP  +  (1  -  77) t)  <  77  -F„(p)  +  (1  -  77)  -F„(t)  when  ||m||<£. 

This  statement  contravenes  (B.8). 

Continuity  at  zero.  Imitating  the  continuity  argument  in  Lemma  B.l,  we  find  that 

F(e)  -  F(0)  =  E  [Fg(e)  -  Fg[ 0)]  <  E  [2  ||g||  ||;reS(g)  II  +  l|7T£S(g)||2  ]  <  2 Vd  •  ( eB )  +  [eB)2  -  0  as  e  -  0. 

This  is  all  that  is  required. 

Differentiability.  This  point  follows  from  a  routine  application  of  the  Dominated  Convergence  Theorem. 
Indeed,  for  every  r  >  0,  the  function  F(r)  =  E[Fg(r)|  takes  a  finite  value,  and  Lemma  B.l  establishes  that  F'g  is 
continuously  differentiable.  For  each  compact  interval  I,  the  bound  (B.4)  ensures  that 


Esup  |Fg(r)|<Esup  [2B  (||g||  +  tB)]  <  2B\/d  +  2B2  sup  r2 

te /  r £/  re/ 


<  OO. 


The  convergence  theorem  now  implies  that  F'(r)  =  ^  E  [Fg(r)|  =  E  [Fg(r)|  for  all  r  >  0. 

Attainment  of  minimum.  The  median  of  the  random  variable  ||g||  does  not  exceed  \fd.  Therefore,  when 
rb  >  \fd,  we  have 

F(t)  >  E[Fg(r)|  ||g||  <  Vd\-P{  ||g||  <  Vd}  >  ^e[(t/7-  ||g||)2  |  ||g||  <  Vdj  >  ^{rb-Vd)2. 

The  first  inequality  follows  from  the  law  of  total  expectation,  and  the  second  depends  on  (B.2).  In  particular, 
F(t)  >  F(0)  =  d  when  r  >  2 b~l\fd.  Thus,  any  minimizer  of  F  must  occur  in  the  compact  interval  [0,2 b~l\fd]. 
Since  F  is  strictly  convex  and  continuous,  it  attains  its  minimum  at  a  unique  point.  □ 
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B.2.  Error  bound  for  descent  cone  calculations.  In  this  section,  we  prove  Theorem  4.5,  which  provides  an 
error  bound  for  Proposition  4.4.  We  require  a  standard  result  concerning  the  variance  of  a  Lipschitz  function 
of  a  standard  normal  vector. 


Fact  B.3  (Variance  of  a  Lipschitz  function).  Let  H:Rd  —  R  be  a  function  that  is  Lipschitz  with  respect  to  the 
Euclidean  norm: 


\H(u)-  H(y)  |  <  M  ■  \\u-  y\\  for  all  u,yeUd. 


Then 


Var(ff(g))  <  M2 


(B.9) 


where  g  is  a  standard  normal  vector. 


Fact  B.3  is  a  consequence  of  the  Gaussian  Poincare  inequality;  see  [Bog98,  Thm.  1.6.4]  or  [LedOl,  p.  49]. 

Proof  of  Theorem  4.5.  Let  /  :  Rd  -> ■  R  be  a  norm,  and  fix  a  nonzero  point  x  e  Rrf.  According  to  [HUL93a, 
Ex.  VI.3.1],  the  subdifferential  of  the  norm  satisfies 

S  -  df{x)  -  {s  eUd  :  (s,  x)  -  fix)  and  /°(s)  =  l},  (B.10) 

where  f°  is  the  norm  dual  to  /.  Thus,  S  is  nonempty,  compact,  convex,  and  it  does  not  contain  the  origin. 

As  in  Lemmas  B.l  and  B.2,  we  introduce  the  functions 

Fu  :  t  >— ■  dist2(w,rS)  and  F :  r  E  [Tg(r)] , 

where  g  is  a  standard  normal  vector.  Proposition  4.4  provides  the  upper  bound  EinfT>o  Fg{r)  <  infT>o  F{ r). 
Our  objective  is  to  develop  a  reverse  inequality. 

We  establish  the  result  by  linearizing  each  function  Fg  around  a  suitable  point.  Lemma  B.2  shows  that  the 
function  F  attains  its  minimum  at  a  unique  location,  so  we  may  define 

r*  :=  argmin  F[ r). 

T>0 

Similarly,  for  each  u  e  Rd,  Lemma  B.l  shows  that  Fu  attains  its  minimum  at  some  point  r„  >  0.  For  the 
moment,  it  does  not  matter  how  we  select  this  minimizer.  Since  Fu  is  convex  and  differentiable,  we  can  bound 
its  minimum  value  below  using  the  tangent  at  r*.  That  is, 

inf  Fu{ t)  =  Fu( r„)  >  F„(t*)  +  (t„  -  t*)  -F^(t*). 

T>0 

Should  r*  =  0,  we  interpret  F'u (t+)  as  a  right  derivative.  Replacing  u  by  the  random  vector  g  and  taking  the 
expectation,  we  reach 

E  |  inf  Fg(r)j  >E[Fg(T*)]+E[(rg-T*)-F;(T*)] 

=  FI t*)  +  E  [(t*  -  E[rg])  •  (Fg(r*)  -  E  [f£(r*)])]  +  E[rg  -  t*]  •  E  [F'g( r*)] 

>inf  F(T)-[Var(Tg)-Var(F;(T*)j]1/2  +  E[Tg-T*]-F'(T*).  (B.ll) 

The  second  inequality  depends  on  Cauchy-Schwarz,  and  we  have  invoked  (B.7)  to  identify  the  derivative  of 
F.  From  here,  we  obtain  the  conclusion  (4.10)  as  soon  as  we  estimate  the  two  error  terms.  The  advantage  of 
this  formulation  is  that  the  Lipschitz  properties  of  the  random  variables  allow  us  to  control  their  variances. 

First,  let  us  demonstrate  that  the  last  term  on  the  right-hand  side  of  (B.ll)  is  nonnegative.  Abbreviate 
e\  :=  E  [jg  -  r+]  •  F'{ t*).  There  are  two  possibilities  to  consider.  When  t*  >  0,  the  derivative  F'{ r*)  =  0  because 
t*  minimizes  F.  Thus,  e1  =  0.  On  the  other  hand,  when  r*  =  0,  it  must  be  the  case  that  the  right  derivative 
F'(r*)  >  0,  or  else  the  minimum  of  the  convex  function  F  would  occur  at  a  strictly  positive  value.  Since  rg  >  0, 
we  see  that  the  quantity  e\  >  0. 

To  compute  the  variance  of  rg,  we  need  to  devise  a  consistent  method  for  selecting  a  minimizer  tu  of  Fu. 
Introduce  the  closed  convex  cone  K cone(S),  and  notice  that 

inf  Fu(j)  -  inf  dist2(M,rS)  =  dist2(M,  K). 

T>0  T>0 

In  other  words,  the  minimum  distance  to  one  of  the  sets  tS  is  attained  at  the  point  Ilg(w).  As  such,  it  is 
natural  to  pick  a  minimizer  r„  of  Fu  according  to  the  rule 

t u  :=  inf{r  >  0 :  II K[u)  e  tS}  -  — — .  (B.12) 

fix) 
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The  latter  identity  follows  from  the  expression  (B.10)  for  the  subdifferential  S.  In  light  of  (B.12), 

\Tu~Ty\ =  Y x )  x)l  -  YY ' linjf(“) - n_x(y) II  s  -llw-yll . 

We  have  used  the  fact  (3.4)  that  the  projection  onto  a  closed  convex  set  is  nonexpansive.  Fact  B.3  delivers 

[  Var(Tg)] 1/2  <  CB.13) 

Finally,  let  us  turn  to  the  remaining  variance  term  in  (B.ll).  We  have  already  computed  the  Lipschitz 
bound  we  need  for  the  analysis.  Indeed,  the  inequality  (B.5)  states  that 

|^h(t*)-F>(t*)|  <  [2 sup  Mil -llw-yll. 

1  seS  J 

Another  invocation  of  Fact  B.3  delivers  the  estimate 

[  VarffglT*))] 1/2  <  2  sup  Ml-  (B.14) 

seS 

To  complete  the  proof,  we  combine  the  inequalities  (B.ll),  (B.13),  (B.14),  and  the  fact  that  e\  >  0.  This  is 
the  advertised  result  (4.10).  □ 


Appendix  C.  Statistical  dimension  calculations 

This  appendix  contains  the  details  of  the  calculations  of  the  statistical  dimension  for  several  families  of 
convex  cones:  circular  cones,  (\  descent  cones,  and  Schatten  1-norm  descent  cones. 


C.l.  Circular  cones.  First,  we  approximate  the  statistical  dimension  of  a  circular  cone. 


Proof  of  Proposition  4.3.  We  begin  with  an  exact  integral  expression  for  the  statistical  dimension  of  the  circular 
cone  C  =  Circd(a).  The  spherical  formulation  (4.2)  of  the  statistical  dimension  asks  us  to  average  the  squared 
norm  of  the  projection  of  a  random  unit  vector  0  onto  the  cone.  Introduce  the  angle  p P(0)  :=  arccos(0i) 
between  0  and  the  first  standard  basis  vector  (1,0, ...,0).  Elementary  trigonometry  shows  that  the  squared 
norm  of  the  projection  of  0  onto  the  cone  C  admits  the  expression 


[1,  0<  p<  a, 

F(/3)  :=  ||nc(0)l|2  -  l  cos  2{f-a),  a<p<%  +  a, 
(0,  f  +  a<p<n. 


To  obtain  the  exact  statistical  dimension  5(C)  from  (4.2),  we  integrate  F{q>)  in  polar  coordinates  in  the  usual 
way  (cf.  [SW08,  Lem.  6.5.1]): 


S(C)  -  d  ■ 


r(H 


sin d~2{P)F{P)  d/3. 


(C.l) 


We  can  approximate  the  integral  by  a  routine  application  of  Laplace’s  method  [AF03,  Lem.  6.2.3]: 


f 

Jo 


^d-2  r 


-  iP)  F(P )  d^V?Flf)  +  0  W_3/2)  ■ 

To  simplify  the  ratio  of  gamma  functions,  recall  Gautschi’s  inequality  [OLBCIO,  Sec.  5.6.4]: 

V2T[\d) 


Vd-2 ■ 


L(i(d-1)) 


\[d. 


Combine  the  last  three  displays  to  reach  the  expression  (4.7). 

To  obtain  the  more  refined  estimate  cos(2a)  for  the  error  term,  one  may  use  the  fact  that  the  intrinsic 
volumes  of  a  circular  cone  satisfy 


/ 1 


cfc(Circ  d{a))  = 


£(d- 2) 


Uk-  1) 


2 


sin 


k  HoOcos^  k  1(a )  for  k-  1. 


(C.2) 


This  formula  is  drawn  from  [Amell,  Ex.  4.4.8].  We  are  using  the  analytic  extension  to  define  the  binomial 
coefficient.  The  easiest  way  to  study  this  sequence  is  to  observe  the  close  connection  with  the  density  of  a 
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binomial  random  variable  and  to  apply  the  interlacing  result,  Proposition  5.6.  In  the  interest  of  brevity,  we 
omit  the  details.  □ 


C.2.  Descent  cones  of  the  £\  norm.  In  this  section,  we  show  how  to  use  Recipe  4.1  and  Theorem  4.5  to 
compute  the  statistical  dimension  of  the  descent  cone  of  the  £  \  norm  at  a  sparse  vector.  This  is  a  warmup  for 
the  more  difficult,  but  entirely  similar,  calculation  in  the  next  section. 


Proof  of  Proposition  4.7.  Since  the  £\  norm  is  invariant  under  signed  permutations,  we  may  assume  that  the 
sparse  vector  x  e  [Rrf  takes  the  form  x  =  (xi,...,xs,0,...,0),  where  x-t  >  0.  To  compute  8[f&{\\-\\\ ,  jc)),  we  use  the 
subdifferential  bound  (4.8)  for  the  statistical  dimension  of  a  descent  cone: 

5(@(|MIi,a:))  <  inf  E [ dist2 (g^, r - 5 1| jc|| x ) ] .  (C.3) 

T>0 

Observe  that  the  subdifferential  of  the  £  \  norm  at  x  has  the  following  structure: 


MEdlllcIh 


j  Uj  —  1,  i  = 

1 1 Mil  <  1,  i  =  s+  l,...,d. 


(C.4) 


We  can  compute  the  distance  from  a  standard  normal  vector  g  to  the  dilated  subdifferential  as  follows. 

dist2 (g,r- d|| a: lb)  =  ]T(gi-T)2+  ]T  Pos2(|gi|-T), 

i= 1  i=s+ 1 


where  Pos(a)  :=  a  v  0  and  the  operator  v  returns  the  maximum  of  two  numbers.  Indeed,  we  always  suffer  an 
error  in  the  first  .v  components,  and  we  can  always  reduce  the  magnitude  of  the  other  components  by  the 
amount  t.  Taking  the  expectation,  we  reach 


2e-“2/2d  u. 


/2  r°° 

-  /  (m-t) 

7T  Jt 

Simplify  the  integral,  and  introduce  the  resulting  expression  into  (C.3).  We  reach 


5(®(IHIi>aO)  <  inf  ■(  s(l  +  r2)  +  \  —  [d- s) 

T>0  |  V  71 


(1  +  t2) 


/; 


e-“2/2dM-Te-T2/2 


(C.5) 


(C.6) 


This  expression  coincides  with  the  upper  bound  in  (4.12),  which  has  been  normalized  by  the  ambient 
dimension  d. 

Now,  we  need  to  invoke  the  error  estimate,  Theorem  4.5.  An  inspection  of  (C.4)  shows  that  the  subdif¬ 
ferential  d||jr|li  depends  on  the  number  s  of  nonzero  entries  in  x  but  not  on  their  magnitudes.  It  follows 
from  (3.3)  that,  up  to  isometry,  the  descent  cone  3>{\\-\\i,x)  only  depends  on  the  sparsity.  Therefore,  we  may 

as  well  assume  that  x-  (1, _ _  1,0, _ _ 0).  For  this  vector,  ||jk/  ||jc||||i  =  Second,  the  expression  (C.4)  for  the 

subdifferential  shows  that  ||  m||  <  \fd  for  every  subgradient  u  e  d  ||x||.  Therefore,  the  error  in  the  inequality  (C.6) 
is  at  most  2 Vdls.  We  reach  the  lower  bound  in  (4.12). 

Finally,  Lemma  B.2  shows  that  the  brace  in  (C.6)  is  a  strictly  convex,  differentiable  function  of  t  with  a 
unique  minimizer.  It  can  be  verified  that  the  minimum  does  not  occur  at  r  =  0.  Therefore,  we  determine  the 
stationary  equation  (4.13)  by  setting  the  derivative  of  the  brace  to  zero  and  simplifying.  □ 


C.3.  Descent  cones  of  the  Schatten  1-norm.  Now,  we  present  the  calculation  of  the  statistical  dimension 
of  the  descent  cone  of  the  Schatten  1-norm  at  a  low-rank  matrix.  The  approach  is  entirely  similar  with  the 
argument  in  Appendix  (C.2). 


Proof  of  Proposition  4.8.  Our  aim  is  to  identify  the  statistical  dimension  of  the  descent  cone  of  the  Schatten 
1-norm  at  a  fixed  low- rank  matrix.  The  argument  here  parallels  the  proof  of  Proposition  4.7,  but  we  use 
classical  results  from  random  matrix  theory  to  obtain  the  final  expression.  Our  asymptotic  theory  demonstrates 
that  this  simplification  still  results  in  a  sharp  estimate. 

We  begin  with  the  fixed-dimension  setting.  Consider  an  m  x  n  real  matrix  X  with  rank  r.  Without  loss  of 
generality,  we  assume  that  m  <  n  and  0  <  r  <  m.  The  Schatten  1-norm  is  unitarily  invariant,  so  we  can  also 
assume  that  X  takes  the  form 


X  = 


Z  0 
0  0 


where  Z  =  diag(<7i,<T2,...,o>)  and  >  0  for  i  =  l,...,r. 
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The  subdifferential  bound  (4.8)  for  the  statistical  dimension  of  a  descent  cone  states  that 

<5(@(||-||Sl,X))<inf  E[dist2(G,T-d||X||Sl)],  (C.7) 

where  we  compute  distance  with  respect  to  the  Frobenius  norm.  The  m  x  n  matrix  G  has  independent  standard 
normal  entries,  and  it  is  partitioned  conformally  with  X: 


G  - 


Gn 

G21 


G12 

G22 


where  Gnisrxr  and  G22  is  (m- r)  x  {n- r). 


According  to  [Wat92,  Ex.  2],  the  subdifferential  of  the  Schatten  1-norm  at  X  takes  the  form 


d||X||Sl  = 


Ir 

0 


0 

w 


:  (7i(W)  <  1 


(C.8) 


where  denotes  the  maximum  singular  value  of  IV.  It  follows  that 

2 


dist(G,T-d||X||Sir 


G|  1  -  rlr 
G21 


G12 

0 


+  inf  ||G22-tIV|| 

F  aims  1 


Using  the  Hoffman-Wielandt  Theorem  [HJ90,  Cor.  7.3.8],  we  can  derive 


2 

f- 


inf  ||  G22  -  rlV||F  =  inf 

o-!(W)<  1  r  a1(V\n<l 


E  (CT / (G22)  -  tct i{W))2 

i= 1 


E  Pos2(cr ,(G22)  -  r), 

i= 1 


where  oy(')  is  the  zth  largest  singular  value.  Combining  the  last  two  displays  and  taking  the  expectation, 


E  [  dist2  (G,  r  •  d  ||X||s1 )]  =  r[m  +  n  -  r  +  r2)  +  E 


m-r 


E  Pos2(ct,-(G22)-t) 
i=  1 


Introduce  this  expression  into  (C.7): 


<5(@(IMISl,X))<inf 


|  r(m  +  n  -  r  +  t2)  +  E 


E  Pos2(cti  (G22)-t) 

i=\ 


(C.9) 


(C.10) 


We  reach  a  nonasymptotic  bound  on  the  statistical  dimension. 

Next,  we  apply  the  error  bound,  Theorem  4.5.  The  expression  (C.8)  shows  that  the  subdifferential  dllXHsj 
depends  only  on  the  rank  of  the  matrix  X  and  its  dimension,  so  the  descent  cone  @(IHIsi  ,X)  has  the  same 
invariance.  Therefore,  we  may  consider  the  m  x  n  rank-r  matrix  X  -  I,  ®0,  which  verifies  ||X/ 1| XC||  || Sl  =  VT. 
Each  subgradient  Fed  HXI^  satisfies  the  norm  bound  ||  F||f  <  \fm.  We  conclude  that  the  error  in  (C.10)  is  no 
worse  than  2 Vmlr. 

It  is  challenging  to  evaluate  the  formula  (C.10)  exactly.  In  principle,  we  could  accomplish  this  task  using 
the  joint  singular  value  density  [And84,  p.  534]  of  the  Gaussian  matrix  G22 •  Instead,  we  set  up  a  framework 
in  which  we  can  use  classical  random  matrix  theory  to  obtain  a  sharp  asymptotic  result. 

Consider  an  infinite  sequence  {X{r,  m,  n)}  of  matrices,  where  X[r,  m,  n)  has  rank  r  and  dimension  m  x  n  with 
m<n.  For  simplicity,  we  assume  that  the  problem  parameters  r,  m, »  —  00  with  constant  ratios  rim,  =:  p  e  (0, 1) 
and  min  v  e  (0,1].  The  general  case  follows  from  a  continuity  argument.  After  a  change  of  variables 
r  •  t \/ n  -  r  and  a  rescaling,  the  expression  (C.9)  leads  to 


- E  [ dist2  [G,i\/n-  r-d 

mn 


||X(r,m,n)||Sl)] 


=  pv  +  p(l  -  pv)(l  +  t2)  +  (1  -  p)(l  -  pv)  -E 


J  m-r 

-  E  P°s2(cr i(Z)  -  t) 

m-r  fz 1 


(C.ll) 


Here,  G  is  an  m  x  n  standard  normal  matrix.  The  matrix  Z  has  dimension  ( m-r)x{n  -  r),  and  its  entries  are 
independent  normal(0,  [n-  r)_1)  random  variables. 

Observe  that  the  expectation  in  (C.ll)  can  be  viewed  as  a  spectral  function  of  a  Gaussian  matrix.  We  can 
obtain  the  limiting  value  of  this  expectation  from  a  variant  of  the  Marcenko-Pastur  Law  [MP67] . 


Fact  C.l  (Spectral  functions  of  a  Gaussian  matrix).  Fix  a  continuous  function  F :  IR+  —  IR.  Suppose  p,  q  —  00 
and  plq  —• ■  y  e  (0, 1].  Let  Zpq  be  a  px  q  matrix  with  independent  normal(0,  q~l)  entries.  Then 


E 


P  i= 1 


na+ 

Ja- 


F(u)  •  q)y[u)du. 
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The  limits  a±  :=  1  +  sjy.  The  kernel  cpv  is  a  probability  density  supported  on  [a-,a+]: 

(pv{u)\- - \[ {u2  -  -  u2)  for  ue[a-,a+). 

J  IT  V 11  * 


nyu 


Fact  C.l  is  usually  stated  differently  in  the  literature.  The  result  here  follows  from  the  almost  sure  weak 
convergence  of  the  empirical  spectral  density  of  a  sample  covariance  matrix  to  the  Marcenko-Pastur  den¬ 
sity  [BS10,  Thm.  3.6]  and  the  almost  sure  convergence  of  the  extreme  eigenvalues  of  a  sample  covariance 
matrix  [BS10,  Thm.  5.8],  followed  by  a  change  of  variables  in  the  integral.  We  omit  the  uninteresting  details 
of  this  reduction. 

Let  us  apply  Fact  C.l  to  our  problem.  The  limiting  aspect  ratio  y  of  the  matrix  Z  satisfies 


m  —  r  v(l-p) 

y= - =  -; - ■ 

n—r  1  -  pv 

As  r,  m,  n  — <•  oo,  we  obtain  the  limit 


E 


£  P0S2((7;(Z)-T) 


1=1 


f 


Pos 2{u-t)  • (py[u)du . 


Simplifying  the  latter  integral  and  introducing  it  into  (C.l  1),  we  reach 

1  _  ra+ 

- E  [dist2  (G, Ty/n  —  r-d||X(r,  m, «) || )]  — ►  pv  +  p(l  -  pv)(l  +  t2)  +  (1  -  p) (1  -  pv)  I  [u-  t)2  -cpy{u)  d u. 

inn  </a_vr 


Rescaling  the  error  estimate  for  (C.10),  we  see  that  the  error  in  the  normalized  statistical  dimension  is  at 
most  2 l[n\fmr),  which  converges  to  zero  as  the  parameters  grow.  We  obtain  the  asymptotic  result 

— 8[@i[\\-\\si’X(r,m,n)))  —  inf  |pv  +  p(l  -  pv)(l  +  r2)  +  (1  -  p)(l  -  pv)  f  [u-t]2 -<py{u)du\ . 
mn  r>o  (  Jn_vT  J 

This  is  the  main  conclusion  (4.14).  To  obtain  the  stationary  equation  (4.16),  we  differentiate  the  brace  with 
respect  to  r  and  set  the  derivative  to  zero.  □ 


C.4.  Permutahedra  and  finite  reflection  groups.  In  this  section,  we  use  a  deep  connection  between  conic 
geometry  and  classical  combinatorics  to  compute  the  statistical  dimension  of  the  normal  cone  of  a  (signed) 
permutahedron.  This  computation  requires  the  full  power  of  Proposition  5.11,  the  characterization  of  the 
statistical  dimension  as  the  mean  intrinsic  volume  of  a  cone. 

A  finite  reflection  group  is  a  finite  subgroup  Sf  of  the  orthogonal  group  Otj  that  is  generated  by  reflections 
across  hyperplanes  [ST54,  Ste59,  CM72,  BB10].  Each  finite  reflection  group  partitions  M'1  into  a  set  { U G//  : 
U  of  polyhedral  cones  called  chambers.  The  chambers  of  the  infinite  families  A^-i  and  BCd  of  irreducible 
finite  reflection  groups  are  isometric  to  the  cones 

Ca-- {xeUd  :  xi<  ■■■  <  Xd}  and  Cbc  ■=  {*£  :  0  <  x\  <  <  xc;}. 

It  turns  out  that  the  chambers  Ca  and  Cbc  coincide  with  the  normal  cones  of  certain  permutahedra. 

Fact  C.2  (Normal  cones  of  permutahedra).  Suppose  that  the  vector  x  has  distinct  entries.  Then  the  normal  cone 
JT{3?{x),x)  is  isometric  to  Ca  and  the  normal  cone  (jc),x)  is  isometric  to  Cbc- 


See  [HLT11,  Sec.  2]  for  a  proof  of  Fact  C.2. 

We  claim  that  the  statistical  dimensions  of  the  chambers  Ca  and  Cbc  can  be  expressed  as 

<5(CA)  =  Hrf  and  d(CBC)  = 


(C.12) 


where  H ^  :=£j^=1  i~l  is  the  dth  harmonic  number.  Proposition  4.9  follows  immediately  when  we  combine  this 
statement  with  Fact  C.2. 

Let  us  explain  how  the  theory  of  finite  reflection  groups  allows  us  to  deduce  the  expression  (C.12)  for 
the  statistical  dimension  of  the  chambers.  First,  it  follows  from  [BZ09]  and  the  characterization  [SW08, 
Eq.  (6.50)]  of  intrinsic  volumes  in  terms  of  polytope  angles  that 


VklCy)  = 


\M\ 

\9\  ’ 
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where  T%?k  is  the  subset  of  H  consisting  of  matrices  with  a  {d- fc)-dimensional  null  space.  Define  the  generating 
polynomial  of  the  intrinsic  volumes 

^(s)  :=  E  vk{Cy)  sk=  \Mk\  sk. 

k= 0  l=Dfc=0 

This  polynomial  is  a  well-studied  object  in  the  theory  of  finite  reflection  groups,  and  it  has  many  applications 
in  conic  geometry  as  well  [Amell,  Sec.  4.4].  For  our  purposes,  we  only  need  the  relationships 

^(1)  =  1  and  ^(1)  =  E  kvklQf)=8(Qf). 


These  points  follow  immediately  from  (5.1)  and  Proposition  5.11. 

The  roots  {(k  :  k  -  1  of  the  polynomial  q<g  are  called  the  (negative)  exponents  of  the  reflection 

group  [CM72,  Sec.  7.9].  Factoring  the  generating  polynomial,  we  obtain  a  concise  expression  for  the 
statistical  dimension: 


8{Cy 0  =  ^(1)  = 
as 


d 

nT 

U=1 


Li  i  -<k 


d  i 

y 


We  can  deduce  the  value  of  the  large  parenthesis  because  of  the  normalization  qy{  1)  =  1.  The  exponents 
associated  with  the  groups  Ad_ 1  and  B Cd  are  collected  in  [CM72,  Tab.  10],  from  which  it  follows  immediately 
that 

8(Ca)  =  E  7  and  S(CBc)  =  7  E  t- 
k=i  K  z  k=  1 lc 

This  completes  the  proof  of  the  claim  (C.12). 


Appendix  D.  Technical  lemmas  for  concentration  of  intrinsic  volumes 


This  appendix  contains  the  technical  results  that  undergird  the  proof  of  the  result  on  concentration  of 
intrinsic  volumes,  Theorem  6.1,  and  the  approximate  kinematic  bound,  Theorem  7.1. 

D.l.  Interlacing  of  tail  functionals.  First,  we  establish  the  interlacing  inequality  for  the  tail  functionals. 
This  result  is  a  straightforward  consequence  of  the  Crofton  formula  (5.10). 

Proof  of  Proposition  5.6.  Let  Ld_k+i  be  a  linear  subspace  of  dimension  d-  k+  1,  and  let  L^-k  be  a  linear 
subspace  of  dimension  d-k  inside  Ld_k+\.  The  Crofton  formula  (5.10)  shows  that  the  half-tail  functionals 
are  weakly  decreasing: 

2 hk+1{C)  =  P{CnQLd-k  /  {0}}  <  P{CrQLd_k+i  ±  10}}  -  2 hk{C), 

where  the  inequality  follows  from  the  containment  of  the  subspaces.  We  can  express  the  tail  functional  tk  as 
the  average  of  the  half-tail  functionals: 

iffc(C)  =  i[MC)  + hfc+1(C)]. 

Therefore,  2hk(C)  >  tk{C)  >  2hk+\{C).  □ 


D.2.  Bounds  for  tropic  functions.  We  continue  with  the  proof  of  Lemma  6.2,  which  provides  a  bound  on 
the  tropic  functions.  This  argument  is  based  on  an  approximation  formula  from  the  venerable  compendium  of 
Abramowitz  &  Stegun  [AS64,  Sec.  26.5.21]. 


Fact  D.l  (Approximation  of  beta  distributions).  Let  Xbe  a  beta {a,b)  random  variable.  Assume  that  a  +  b>  6 
and  {a+  b-  1)  (1  -  x)  >  0.8.  Define  the  quantity  y  via  the  formula 

3[mi(l-g^)-m2(l-g^)] 


y  = 


where  wi  —  (bx)1/3  and  u>2  -  {a{  1  -  x)) 


1/2 


1/3 


Then 

P{X  <  x}  -  +  c(x)  with  |e(x)|  <  5  •  10“3. 

The  function  <5  represents  the  cumulative  distribution  of  a  standard  normal  random  variable. 
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Proof  of  Lemma  6.2.  We  need  to  show  that  the  tropic  function  I^lkld)  >  0.3  for  all  integers  k  and  d  that  satisfy 
1  <  k  <  d  -  1.  (The  cases  k  -  0  and  k  -  d  are  trivial.)  To  accomplish  this  goal,  we  represent  the  tropic  function 
in  terms  of  a  beta  random  variable,  and  we  apply  Fact  D.l  to  approximate  its  value. 

Let  X  ~  BETA(a, b)  with  shape  parameters  a  -  \k  and  b  =  \{d-k).  In  particular,  E[X]  =  kid.  It  is  well 
known  [Art02,  Sec.  2]  that  the  tropic  function  can  be  expressed  in  terms  of  this  random  variable: 

1  -  I*  (kid)  =  P>{  ||nLjt (0) [|2  <  kid}  =  P>{X  <  E[X]}. 

We  must  show  that  this  probability  is  bounded  above  by  0.7. 

First,  the  mean-median-mode  inequality  [vdVW93]  implies  that  the  mean  k/d  of  X  is  smaller  than  the 
median  when  k  >  \d,  so  the  probability  is  bounded  by  0.5  in  this  regime. 

Second,  the  cases  where  a+b<  6  correspond  with  the  situation  where  d  <  12.  We  can  enumerate  the  cases 
where  d  <  12  and  0<k<dl2.  In  each  case,  we  verify  numerically  that  the  required  probability  is  less  than  0.7. 

It  is  easy  to  check  that  for  a  -  ^k,  b-  \  {d-k),  and  x-  kid,  the  inequality  [a+b- 1)  (1  -  x)  >  0.8  holds  when 
0<k<  and  d  >  12.  Instantiating  the  formula  from  Fact  D.l  and  simplifying,  we  reach 


y: 


x/2 


d  —  2k  \J2 
< 


3  Vdk[d-k)  3 

Indeed,  for  each  d,  the  extremal  choice  is  k=  1.  The  function  <b  is  increasing,  so  we  conclude  that 


P{X  <  kid}  <  <D 


Vi 


3 


+  5  •  10  <  0.69. 


The  latter  bound  results  from  numerical  computation. 


□ 


D.3.  The  projection  of  a  spherical  variable  onto  a  cone.  In  this  section,  we  establish  Lemma  6.3,  which 
controls  the  probability  that  a  spherical  random  variable  has  an  unusually  large  projection  on  a  cone.  Although 
the  lemma  is  framed  in  terms  of  a  spherical  random  variable,  it  is  cleaner  to  derive  the  result  using  Gaussian 
methods.  We  require  an  exponential  moment  inequality  [Bog98,  Cor.  1.7.9]  that  ultimately  depends  on  the 
Gaussian  logarithmic  Sobolev  inequality. 

Fact  D.2  (Exponential  moments  for  a  function  of  a  Gaussian  variable).  Suppose  that  F :  IRrf  — ►  IR  satisfies 
EF(g)  -  0.  Assume  moreover  that  F  belongs  to  the  Gaussian  Sobolev  class  Ft1  (y^),  i.e.,  the  squared  norm  of  its 
gradient  is  integrable  with  respect  to  the  Gaussian  measure.  Then 

Eemg)<(Ee(f/4)IIVF(g)||2)2f/(1-2«  w/len0<f<I.  (D.l) 

Using  this  exponential  moment  bound,  we  reach  an  elegant  estimate  for  the  moment  generating  function 
of  the  squared  projection  of  a  standard  normal  vector  onto  a  cone. 


Sublemma  D.3  (Exponential  moment  bounds).  Let  K ec^’d  be  a  closed  convex  cone.  Then 


EeGIHWII2-S(*))  <  exp  for 


1 

4 


<£< 


1 

4’ 


(D.2) 


Proof.  The  result  is  trivial  when  £  =  0,  so  we  may  limit  our  attention  to  the  case  where  the  parameter  £  is 
strictly  positive  or  strictly  negative.  First,  suppose  that  £  >  0.  Consider  the  zero-mean  function 

i7(g)  =  linJf(g)ii2-dOT  with  iiVF(g)ii2  =  4|irfy(g)ii2.  (D.3) 


The  gradient  calculation  follows  from  (3.10),  and  it  is  easy  to  see  that  F  e  because  the  projection  onto 

a  cone  is  a  contraction.  The  exponential  moment  bound  (D.l)  delivers  the  estimate 


<  (Eef||n^®l|2)2f/tl'2f) 


=  exp 


2  V8UQ 
l-2£ 


2f/(l-2f) 


The  second  relation  follows  when  we  add  and  subtract  I;  SIX')  in  the  exponential  function.  We  have  compared 
the  moment  generating  function  of  F(g)  with  itself.  Solving  the  relation,  we  obtain  the  inequality 


Efy F fg>  <  exp  1 |  for0<£<i. 


This  is  the  bound  (D.2)  for  the  positive  range  of  parameters. 
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Now,  we  turn  to  the  negative  range  of  parameters,  which  requires  a  more  convoluted  argument.  To  make 
the  analysis  clearer,  we  continue  to  assume  that  >  0,  and  we  write  the  negation  explicitly.  Replacing  F  by 
-F,  the  exponential  moment  bound  (D.l)  yields 

Ee-^(g)  =  Eef  <  (Eef  “™l|2)2f/tl'2f) .  (D.4) 


This  time,  we  cannot  identify  a  copy  of  the  left-hand  side  on  the  right-hand  side.  Instead,  let  us  run  the 
moment  comparison  argument  directly  on  the  remaining  expectation: 

Eef  linjc(g)ll2  _  e(sm  ,Eeff(g)  <  ef5(iO  ^EeflinJf(g)ll2j2f/(1“2°  _ 

The  last  inequality  follows  from  the  exponential  moment  bound  (D.l),  just  as  before.  Solving  this  relation, 
we  obtain 

Eeflin.(g)ll2<exp^1-^W)j  for0<f<i. 

Introduce  the  latter  inequality  into  (D.4)  to  reach 


Ee  <  eXp 


2 

i—4^  r 


This  estimate  addresses  the  remaining  part  of  the  parameter  range  in  (D.2). 


□ 


With  this  result  at  hand,  we  can  easily  prove  the  tail  bound  for  the  projection  of  a  spherical  random  variable 
onto  a  cone. 

Proof  of  Lemma  6.3.  For  a  parameter  >  0,  the  Laplace  transform  method  [Bar05,  Sec.  2]  delivers 

P{d  ||nc(0)ll2  >  <5(C)  +  A}  <  e~a~^S(Q  .Eewl|nc(0)l12. 

Let  R  be  a  chi  random  variable  with  d  degrees  of  freedom,  independent  from  0.  Using  Jensen’s  inequality,  we 
can  bound  the  expectation: 

Eefrf||nc{0)ll2  _  Eef(Efl2)  ll nc (0)|| 2  <  Eef||nc(i?0}||2  _  EeCinc(g)l]2. 

Combining  these  results,  we  obtain 

P{d  |]nc(0)l|2  >  <5(C)  +  A}  <  e"a  ■  Eef  C"ncC^)ll2-5cc)) _  (D.5) 

Substitute  the  inequality  for  the  moment  generating  function  (D.2)  with  K  —  C  into  (D.5)  to  reach 

P{d  imcWII2  >  g(Q  +  A}  <  e~a •  exp |2^ j  for  0<^<i. 

Select  £,  =  A/(4<5(C)  +  4A)  to  complete  the  first  half  of  the  argument. 

The  second  half  of  the  proof  results  in  an  analogous  bound  with  C  replaced  by  C°.  Note  that 

P{d  ||nc(0)|]2  >  (5(0  +  A}  =  P{d ( ||nc(0)ll2  -  1)  +  Id- (5(0)  >  A}  =  P{5(C°)  -  d  nnc° (0)ll2  >  A}. 

The  second  relation  follows  from  the  Pythagorean  identity  (3.8)  and  the  totality  law  (4.4).  Repeating  the 
Laplace  transform  argument  from  above,  with  <(  >  0,  we  obtain  the  inequality 

P{d  ||nc(0)H2  >  (5(0  +  A}  <  e-^A.Ee"f(l|IIco{sr)l|2_5(CO)).  (D.6) 

Introduce  the  bound  (D.2)  with  K-C°  into  (D.6)  to  see  that 

P {d  l|nc(0) ||2  >  g(0  +  A}  <  •  exp  j2^ ^  } )  for  0<f<i. 

Choose  X!  (4d(C°)  +  4A)  to  complete  the  second  half  of  the  argument.  □ 
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D.4.  Tail  functionals  of  a  product.  Finally,  we  argue  that  the  tail  functional  of  a  product  cone  is  controlled 
by  the  tail  functionals  of  the  two  summands.  This  result  is  very  simple,  and  it  can  surely  be  strengthened  by 
taking  advantage  of  the  structure  of  the  product. 

Proof  of  Lemma  7.2.  Let  C,Ke  be  closed  convex  cones.  Define  independent  random  variables  X  and  Y 
that  take  values  in  {0, 1 , . . . ,  d}  and  whose  distributions  are  given  by  the  intrinsic  volumes  of  the  cones  C  and  K, 
respectively.  That  is, 


P{X=A;}=  vk{C)  and  P  {Y=k}=vk(K)  for  jfc  =  0,l,2,...,d. 

According  to  the  rule  (5.4)  for  the  intrinsic  volumes  of  a  product  cone, 

P{X+Y=Jc}=  vk(CxK )  for  fc  =  0,1,2,..., 2d. 

By  dint  of  this  identity,  we  can  use  probabilistic  reasoning  to  bound  the  tail  functionals  of  the  cone  vk{C  x  K). 
Indeed,  observe  that 


P{A  +  Y  >  <5(C)  +  <5(0  +  2A}  <  P{X  >  <5(0  +  A}  +  (P{7  >  <5(0  +  A}. 

We  can  rewrite  this  inequality  in  terms  of  tail  functionals: 

t\8{C)+6(K)+ 2A1  (CxO<  fr<5(C)+Al  CO  +  tlSOO+X]  (^0- 

This  is  the  advertised  conclusion.  □ 

Appendix  E.  The  relationship  between  statistical  dimension  and  Gaussian  width 

This  appendix  contains  a  short  proof  of  Proposition  10.1,  which  states  that  the  Gaussian  width  of  a  spherical 
convex  set  is  comparable  with  the  statistical  dimension  of  the  cone  generated  by  the  set. 

Proof  of  Proposition  10.1.  Let  Ce  ^  be  a  closed  convex  cone.  It  is  easy  to  check  that  the  statistical  dimension 
of  C  dominates  its  Gaussian  width: 

w2(C):=  [E supx£CnSd  <g,  x>]2  <  [EsupieCnBd  (g,  x)]Z  =  [E dist(g, C°)]2  <  E [dist2(g,C°)]  =  <5(C). 

The  first  inequality  holds  because  we  have  enlarged  the  range  of  the  supremum,  and  the  second  relation 
depends  on  a  standard  duality  argument  [CRPW12,  Eqns.  (30),  (33)].  Afterward,  we  invoke  Jensen’s 
inequality,  and  we  recognize  the  polar  form  (4.4)  of  the  statistical  dimension. 

For  the  reverse  inequality,  define  the  random  variable  Z  :=  Z(g )  :=  supJ.eCnS,<-i  (g,  x),  and  note  that 
w(C)  =  EZ.  The  function  g>—  Z(g)  is  1-Lipschitz  because  the  supremum  occurs  over  a  subset  of  the  Euclidean 
unit  sphere.  Therefore,  we  can  bound  the  fluctuation  of  Z  as  follows. 

E [Z2]  -  w2{C)  =  E[(Z- EZ)2]  =  Var(Z)  <  1.  (E.l) 

The  last  inequality  follows  from  Fact  B.3 

As  a  consequence  of  (E.l),  we  obtain  the  required  bound  <5(C)  <  w2{C )  + 1  as  soon  as  we  verify  that 
<5(C)  <  E[Z2].  Since  Z2  is  a  nonnegative  random  variable, 

E [Z2]  >  [Z2-1Rrf\Co(g)]  =  E  |(suP;l.eCnSd-i  <g,  ^>]2  •  Hr4\C° (g) ]  . 

where  denotes  the  indicator  of  the  event  E.  We  claim  that  the  right-hand  side  of  this  inequality  equals 
the  statistical  dimension  <5(C).  Indeed,  for  any  u  t  C°,  a  homogeneity  argument  delivers  sup  tECnSrf  i  ( u ,  x)  - 
suP^£CnBrf  l  t'u'  x )•  On  the  other  hand,  when  ue  C°,  we  have  the  relation  supxeCnB<*-i  (u,  x)  =  0.  Combine 
these  observations  to  reach 

E[Z2]  >E[(supJ1.eCnS4-i  (g,  *>)2-V\c°(so]  =  E  [(supxeCnB4-i  (g,  *>f]  -  E[dist2(g,C°)]. 


The  last  relation  follows  from  the  same  duality  argument  mentioned  above.  On  account  of  (4.4),  we  identify 
the  right-hand  side  as  the  statistical  dimension  <5(C).  □ 
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